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FOREWORD 


The book of Professor kEvtushenko describes both the 
theoretical foundations and the range of applications of many 
important methods for solving nonlinear programs. Particularly 
emphasized is their use for the solution of optimal control 
problems for ordinary differential equations. These methods 
were instrumented in a library of programs for an interactive 
system (DISO) at the Computing Center of the USSR Academy of 
Sciences, which can be used to solve a given complicated 
problem by a combination of appropriate methods in the 
interactive mode. Many examples show the strong as well the 
weak points of particular methods and illustrate the 
advantages gained by their combination. In fact, it is the 
central aim of the author to point out the necessity of using 
many techniques interactively, in order to solve more dif- 


ficult problems. 


A noteworthy feature of the book for the Western reader is the 
frequently unorthodox analysis of many known methods in the 
great tradition of Russian mathematics. 
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PREFACE , 


Optimization methods are finding ever broader application in sci- 
ence and engineering. Design engineers, automation and control 
systems specialists, physicists processing experimental data, eco- 
nomists, as well as operations research specialists are beginning 
to employ them routinely in their work. The applications have in 
turn furthered vigorous development of computational techniques 
and engendered new directions of research. Practical implementa- 
tion of many numerical methods of high computational complexity is 
now possible with the availability of high-speed large-memory 
digital computers. Indeed, experience has shown that the most ef- 
ficient way of solving optimization problems is by interactive 
man-machine mode, allowing the use of a variety of optimization 


techniques for any given problem. 


This book deals with computational techniques (including inter- 
active man-machine methods) for solving nonlinear programming pro- 


blems as well as constrained optimal control problems. 


The book has seven chapters. Chapter 1 reviews relevant parts of 
convex analysis and derives necessary and sufficient conditions 
for optimality in nonlinear methods for solving systems of non- 
linear equations, including minimax solutions; these methods of 
independent interest are used later on. Chapters 3, 4 and 5 treat 
numerical methods for solving nonlinear programming problems. Chap- 
ter 3 considers various modifications of the penalty-function 
method. Chapter 4 deals with methods using several modifications 
of the Lagrangian. Relaxation methods are described in Chapter 5. 
Chapter 6 presents numerical methods for solving optimal control 
problems with state constraints (including nondifferentiable 
functionals), drawing on nonlinear programming--in particular, 
modified Lagrangians, constrained gradients, gradient projections, 


Newton-Raphson, among others. 


(xiv) PREFACE 


Chapter 7 is new--added in the English edition. It contains new re- 
sults on global numerical methods for a variety of problems (e.g., 
global minimization of multivariable functions, nonlinear program- 
ming, multicriteria optimization, and solution of nonlinear equa- 
tions) based on the method of nonuniform covering. While available 
in the Soviet literature, they are almost unknown in the West. Va- 
rious versions of the nonuniform-covering method can also be used 
for parallel computers. Appendices I and II provide a review of 
relevant results from Analysis, Linear’ Algebra, and Point-—to-Set 
Mapping Theory, used in the book. The bibliography makes no claim 
of completeness (the number of extant references exceeds well over 
a thousand) and lists only those papers and monographs used direct- 
ly in writing the book. Special attention has been given to those 
methods which had been tested extensively in recent years at the 


USSR Academy of Sciences Computing Center, and improved upon. 


The author expresses his deep gratitude to N.N. Moiseev for his 
encouragement in writing this book and his continued interest. Ex- 
tensive access to the Western literature on this subject was made 
possible through the courtesy of Professor O. Hellman and the Turku 
University Library. The author is grateful to his co-workers at the 
Computing Center, O.P. Burdakov, A.I. Golikov, N.I. Zhadan, and 
V.A. Purtov for reading the manuscript and making many useful com- 


ments. 


O.P. Burdakov assisted in writing Section 2.6, and Section 6.7 was 


written in collaboration with N.I. Grachev. 


Notation 


Theorems, Lemmas and Definitions are labelled by a triple 


number: the first digit stands for the,chapter number, the second 


digit for the section number, and the third digit 
number of the theorem, lemma or definition in the 
mulas are labelled by two digits: the first digit 
section number, the second digit is the number of 
the section. If reference is made to a formula in 


one more digit is added to indicate the number of 


is the ordinal 
section. The for- 
sands. tom st hie 
the formula in 
another chapter, 


the chapter. 


enemas x is an element of the set X; 

a: Msc noOteanne lementt, Otesule Seu x: 

Rie Ys the set X ‘ts a subset of YY, % = Y 38 not e 
excluded; 

XG OY. wie Sees OC pinch we Conrlnyeices 

Umea: tHemUNLOne Of thesSeCusm x seatlGus i, 

Ree Gee: the intersection of the sets xX and Y; 

Ge ane! the difference between the sets X and Y, i.e., 
TET Se ie @ sien ci 1 eS me NS TUL © Pw Gs La chi toa = 

ls the Cartesian product. of the sets xX and ¥Y, 1-.e-, 
the Set Of pairs (x,y) where x © Xx, y <= ¥; 

Tb gr the set of interior points of the set X “(see Defi- 


mmabioikere al pik, ys 


ICE the closure of the set X (see Definition (1.1.3); 


(2) 


NOTATION 
X = @: the set XK is empty; 
txts the set of all elements x satisfying condition T; 
een (Can)) N 9 SaelsseaSs Gere) O@ayeligikom fh Sloe 
ie [abs X satisfies the condition “ass As bs 
Eo Soe eee a one-to-one mapping of X onto Y;} 
Wie Xt? 2°. a multivalucd= mapping ot <ssonvommy = CSee 
Appendix III); 
ieee MEACS el Olea leper 
Saas reads a! there ws ante. aa a 
Casi Gey): the distance between the two points x and jy; 


ay dis(x,y): the distance between the point x and 


ye 
the set X; 

GCE: the open neighborhood of the point q; 

G(X): the open neighborhood of the set X; 

G(X): the e-neighborhood of the set X; 


G(xae xe) Gist x. X)< ter: 


Ro: the real linear (normed) n-dimensional space; 

Bae the Euclidean n-dimensional space; 

ig the nonnegative orthant of Bue ive., the set of 
all yectors, of E" all the coordinates of which 
are nonnegative; 

x eR": the vector x is an element of the space Ee 

xi; the — coordinate of the vector x: “in some 

places, xt) is written for greater clarity: 


(3) 


NOTATION 


the io axis column vector. the ion coordinate 


of which is unity, the remaining, zeros; 
a transposed ‘vector, 
a transposed matrix; 
the norm of the®vector™ x," in*mostecases in the text, 
the Euclidean norm is meant (for more detail, see 
Appendix I1); 

the norm of the matrix A, adapted to the norm of 

the Vectors ; 

the determinant of the matrix A; 

all coordinates of the vector p are nonnegative; 


the symmetric matrix B of order n is positive de- 


Sriniten Lae ws elOr panne x. E" and such that 


w 
IV 
°o 


DEZ IE 


ilx|| #0, x Bx > 0; 


the symmetric matrix B of the order n is positive 
n a 


semi-definite, i.e., for any x <«E , x Bx 270) 
a diagonal matrix whose ape diagonal element is 
pie Te coordinate of the vector 2, the dimension 


of the matrix D is determined by the dimension of 
the vector Z; | 

the identity matrix of the order s; 

the set of nonnegative real numbers; 

the zero matrix nxm (in many places in the text, 
where this does not cause any misunderstanding, the 


subscripts nm are omitted); 


(4) NOTATION 


(ay ier Or 


2 = Ta, bh: 


(a,b): 


h,(x), B_(x): 


B(x): 
SC): 


Re Z: 


Tinie 


agd (x) 


at least one of the vectors ae RS Oe be R™ is 


nonzero; 


shorthand notation for Zz = fauten od a .ey Re, 


e 
B och 
the scalar product of the vectors a and b; 
the vector functions whose ee components 


are defined by the formulas: 


max [o,hb(x)] ; 


hi (x) 


hi(x) = min [0,h*(x)]_ , 


if) hive RK. then 
2 © ete ee 
(hv), = y mee Ol Shey ee 
i=2 


the n-dimensional column vector the shh compo- 


nent of which is a) : 
9x 
the square matrix of the order n whose Cieocs 
2 
elementals oe 1k) : 
dxloxJ 


the rectangular matrix whose Ca element is 


oxt 1 

adem vavive Of —Uhe StunictOn ss o= ne thes sea liar 
argument q; 

the real part of the complex number zz; 

the imaginary part of the complex number 2; 


the conjugate of 42; 


the modulus of 2; 


? 


(5) 


NOTATION 


loan 


iene 
K-00 


Ox 


Q; (x): 


Cue: 


{ORS Wine) ee 


the upper limit of the sequence ix}; 

the lower limit of the sequence ix, }; 

the differentiation with respect to the independent 
variable t; fi 

the set of subgradients of the function f at the 
point ‘x (see Definitions ee leeerearC melee) is 
estimates of the rate of convergence (the defini- 
tion of them is given in Section 2.3); 

the set of active bounds of the inequality type at 
the spointy.%u.ece the Owe Mt licen Clea enoy) Das 


the Lagrange function in the nonlinear programming 


problem (1.6.1) (see the formula (1.6.4)); 


Arg min f(x): the set of all those points x«¢X at which the # 
xeX 


sup: 
met ales 


PEP 


minimum of the function f is attained on X; 
supremun ; 
infimum; 


the proof is completed. 


Chapter 1 


AN INTRODUCTION TO OPTIMIZATION THEORY 


In this chapter we present definitions and basic theoretical re- 
sults to be used in developing and justifying numerical methods 

for solving extremal problems. We also give necessary and suffi- 
cient conditions for the extremum in various optimization problems. 
The material of this chapter is essential for understanding the 


Subsequent chapters. 


1. CONVEX SETS AND CONVEX FUNCTIONS 


1. BASIC DEFINITIONS 


1 ae oe By the closed line segment joining the points 

x and y we mean the set of all points representable as 

ix +=) )y, where 0 Se ke 

DEFINITION 1.251. The set xX « E" is convex if the closed line 
segment joining every two points of xX betonow to mx. 

DEFINITION 1.1.2. The point x =X is an interior point of the 
SC Cee Naeedih ON yan myae ne we can find } > 0. such that 

XK rety eX) Torvall 0 <2. < \.\ We denote the set of all anterio. 


DOA CSO, ee Gunny meee Tat ae 


The convexity of the set x implies the convexity of int Xx. 
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We say that a set is open if each of its points is an inter- 


n 


TOMO Nt eCm Sete Lisasaldatogpercioseds ine aE if the com- 


plement of xX «an Ey that is the set Ae. is) Open’. 


DAE UNGCLON lave Casa sthat sohes point ex sesh as damistinge pownt 
of the set X if there exists a sequence of points Hace Xap 
verging to x. The aggregate of all limiting points of the set 


con- 


X is said to be its closure and is denoted by X. 
DERINDETIONS12 1. 43) ew AnsSete eX) asiisaad ito; bermcompactudt  anyoesequence 
of its points contains a subsequence converging to some point of 
UG 

In the space E" the term "compact set" is Synonymous with 
"bounded closed set." 
DEFINITION 1.1.5. A function f£:X*RU{+ } defined on a set XCB” 
iewcalledsconvex: On XG itetonl any two points ~x, ye XxX and 


A & Et Siichetiat 0 <5) 0< el) sand aAxet (1=A)y © X, the condition 
PUK C= y= AE Ca et Clad Ey) Ciel) 


is satisfied. 

If we require:iny addition, that form APfi0;", Af i pjands x7 y 
the sign .< be replaced by < ine Ci ae. Fem ten Ih Chet (Os) ELS 
said to be strictly; convex on 3X: 

if -instead of Sx swe take W>) in (17 1)e athe function f(x) 
is said to be concave on xX. If the function f(x) is convex, 
the function -f(x) is concave. Thus all the properties of con- 
vex functions are easily applicable, when appropriately modified, 
to concave functions. 


A function f(x) defined on the convex set X is convex nets 
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LOrcvany=, x yo Neeeandee ve Et such that "0 <9] < 1967 (Iai) is 


satisfied. 


If f(x) is a convex function of *x’ on the entire space 
e 
ne Wessay ssimplys that ether tune tion eit Gx pea siconvess sa ihel mucd ac 
dean norm of the vector ||x]|| = Vix, x) is a simple example of 


n 


such a convex function on E Indeed, using the triangle inequa- 


lity 





Ax + Cl-adyll < [fax] + ]Q-2yll , 


the property of the norm |/Ax|| = [Al||xl],  [[(1-a)y]] = 11-A|Ilyll, 


we obtain for 0O Se 
[Ax + (1-a)yl] < Allxll + C1-20lly dh 4% 


implying in turn the convexity of the function ||x]|. 

An example of an open convex set is the set of points inter- 
ior to the n-dimensional sphere of radius ¢ centered at the 
point x: 


G(x) = {xeE" : [|x-x|| < e} 


Lit ext overs G(x), we obtain by the triangle inequality: 


JAx+(1—A) yx] =]4 (x—x) +(1-2) (YD) < 
<Ax—x|+(1—A)] y—x] < Ae +(1—A)e =e, 
implying in turn the convexity of the set G(x). 
In the sequel we shall call the set G(x) the e-neighbor- 
hood of the point x. 
DEFINITION “1.1.6.9 The*function f(x) defined on «f° te eaid to 
be infinitely large, if for any positive M there exists R(M) 


such that for any x satisfying ||x||> R the inequality 


(9) tel. GON VEX* SETS AINDY ECONVEXeEUNG TIONS 


f(x) Me holds* 
We denote by dis (x,X) the distance between the point x 

and the set X: 

dis (x,X) = ing ||x-p]| 

pex 

If the set X is convex and closed, in the last equality the 
minimum is attained at a unique point p(x) « X, which is called 
the projection of the point x onto the set xX and is found 
from the condition 

p(x) = Arg min ||x-p|| . Cle 2) 


pex 


We can show that dis (x,X) .is a convex,function of .x. Let xy 


n 


and Xo be arbitrary points of E andas0+ <5) chloe Thengfor 


any z «XX we have the inequality 
dise(ixy) + (1-1) x505 4) 0s I[Ax, + (1-A) x, - Zee Civ) 


Using the definition (1.2), we write Des p(x4), Po = P(X5), 
Py> Po © X. “Due to the convexity of the set xX, all the points 
in the line segment joining Py and Po belonce ton xseeuence, 
TOR erage el nS) eee Zi AP a (1-A)Po, we obtain 


dis (Ax, + (1—A) x, X)< 


<A (4. —p) +0 1) (x,.— Pp.) |< 
<Adis(x,, X)+(1—A) dis(x,, X) 


implying in turn the convexity of the The HLCM Clas (x5) ena Eee 


It is also obvious that for a compact set X this function is 
infinitely large. 
DEFINITION 1.1.7. By the epigraph (hypograph) of the convex func- 


: + 
tion f(x), epi f, we mean the set of points ae - on the 
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graph and above the graph f: 


epi f = {xeE™, » en: f(x) < pu} 


DEFINITION 1.1.8. By the effective domain of the convex function 
f, dom f, we mean the set of points of ag in ’which fe) at— 


tains finite values or the value -o: 


dom f = {xeE" : £(x) < +0} 


The set dom ff is the projection of epi f onto oe Lene 


dom f = {xe E” : ay <«E such that (x,u) eepi f}. 
DEFINITION 1.1.9. The convex function f is called proper if the 
set™ domi Sis neteempty) ands it @¢x)' >) —o aon e dom ut, 

In other words, a proper convex function does not attain the 
value -~» and is not identical to +, In the sequel, we shall 
only consider proper convex functions, without specifying this 
property each time. 

A convex function is continuous at all interior points 
oi dome it". 

DEFINITION 1.1.10. The nonempty set K ¢ E" is said to be a con- 
vex cone, if the following conditions are satisfied: 

O17 -Xoct oy “<. Kae for -any i x--yie k 

0 2A cm CeO rsan Vane xara Kanda haan ZnO, 

These conditions are equivalent to the requirement that 
Cl Sat SV ue Kouects ©) mcl 7 Xen we a OCLT Clana G Lamy 2 0 8 > 0. 

The nonnegative orthant E, is,a convex coné..ulf A ois 4 
mxn matrix and x « ce Or xe EY , the set of all solutions 


of the system Ax < 0 is a convex (polyhedral) cone. 
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n 


By a hyperplane in E we mean the set 


r = {xer". oe Ge. 
where ceE’, |lcl|#0, o« being rea¥. This set is convex and 
always nonempty. if x,© 1, the hyperplane I~ can~be repre 
sented as 

pe ee ete BPD Ke) x-x,) = 0} ; 
WC COMSiISts Of Onlyethose, points: =x for which= the vecvor 


x —- x, is orthogonal to the vector c. The vector c is said 
to be the vector "normal" to the hyperplane I. 

We say that the hyperplane [ with normal vector c separ- 
ates two nonempty sets X and Y, if there exists some y such 


¥ 


tha tector any xe = yey) chesinedualitbres 


(co, x) Seek ees (c,y) 

are satisfied. 
DEHRINEIDION f.1.4 The hyperplanes as said) to bera Ysupport” 
HO) “Wats. SKevig Oe alae ie, x) 220, fOKva dl xs Xpand e,y) =O Lor 
Some pointy <« X° If y= X, “the ‘vector ¢ is said, to be the 
SibeyeroPin WA@rwons Oe ieleter feroie OK Ee the PoOUn tay eC Den Oem) GanGine 
same time, the vector normal to the hyperplane which is a support 
to X and passes through the point jy. 
THEOREM 1.1.1 (on Separability). Let X be a nonempty, convex 
set in Ea not containing the origin. Then there exists a _hyper- 
plane separating the set X from the origin. 

ihe proor of this theorem can be found in many works on Con- 


vex Analysis (see, for instance, Nikaido [1], Mangasarian [1], 


Rockafellar [1]. 
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2. PROPERTIES OF CONVEX FUNCTIONS WITHOUT DIFFERENTIABILITY 


THEOREM 1.1.2. Let the strictly convex function f(x) attain 
the minimum value on E” at some point. xy: Then f(x) is an 
Inia vel ve lanrcres Tunica On. 

Proof. Let S&S be the surface of the. sphere of unit radius 
centered at the point x,, and let Xy be an arbitrary point 
outside S. We draw the straight line between x, and X4s and 
denote by x the intersection point of this line with S. We 
then write 


Ki Xe C1-)d) xy ‘ Om eecanel, 


We define the value of the coefficient }, depending on the 
choice of the point x,, by the condition ||%-x,||= 1, and 


obtain 


0<1—A(x,) p<. 


| xy—x, 


LinemConwexd.b yarCOn Cais. Onm ©: tet Xs) lel 1) eee) nie Smt ant 


CR) > ALC) + (1-A )f£ (x4). Next we find 
hi ~ 
0 < seal (x) —/ (x) | <f (%;)—} (Xe). 


If ||x,||> ° then 1 - (x,;) + 0; from these inequalities we 
conclude that £(x4) + 2 Ul.en,  £(X) 16 an, intiniticty aaree 
iG Ue OMe mena 
THEOREM 1.1.3. A real-valued function f defined on the convex 

al 


set X, is convex on X iff its epigraph is a convex set in eC 3 


Proof, Let the set €pi £ be convex, «x, y seen. Then 
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[XSPECS)PREecetep? fi) Ly ME Cy) 12 te. fepiet 
The convexity condition for the epigraph implies that 
[Ax + (1-\)y, Af(x) +(1-A)f(y)] ¢ epi f 


fovsany~ Or< A-<u1, 
By the definition of the epigraph this implies in turn that 


PCAs AL OCL=M)Y) ph ceant CaS Cle aly) |, 


> 


LCs. ie Sam COLW.C XarOll mes 
To prove necessity, let f£ be convex on X; [ee Meme prteete. 


fy,vl ¢« epi f. From the convexity property of f we have 
POX CLAY) as eAt hse Cla) ity): “< Aller Ch]d)y 


tome pay © 


[A 


Aas anlherefore, 
[Ax + (1-A)y, Au + (1-A)v] e€ epi f 


tor any .0,ceAi<11,), 1.9 .thenset sepia tis convex in Be lol 
THROREM 1. 1.475 The funictaon (fC) is) convex on EC Weil Avene Euay 
SORRY eS E” the function 


Ae ee CRS or C2) 7) 


as convex 10r any. 00< 9) <2. 
Proof. “Lett be convex, and let "x; y be arbitrary points .of 


EX. wWe show that the epigraph of the function (i) defined by 


epi »p = Petia en ON H(A) <a} 


WS 4 Couiieee SCs ile [r4,%,] € epi W, [r5 5%] e epi v, 
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= hyx + (1-A4)y } Zot = hoX + (157519. 


We then have 
f (21) =F (AWF+( —Ay) y) = (Ay) Sm, 
f (22) =P (Ag) <Q. 
Therefore, [z,,2,], [Zo 5 A | e€ epi f. The set epi f is convex; 


hence for any O < v< 1 we have 


[vzy + (1-v)Zo, Voy + (1-v) a5] € gh Loy 
yielding 


f(vz4 + (1-v)Zo5) < Vay + (1-v)d5 
By the definition of the function we obtain 
f(vz1+ (1-v)Zo) = ¥ (vay t+ (1-v) AQ) Sve (1-v)o5 


Hence the epigraph of the function y is a convex set and, there- 
fore, the function yy is convex. The converse is proved similar- 
ly. fii 

THEOREM 1.1.5. If the convex function f attains a finite value 


aed PONG ee Ree the set xX defined by 


Xie abo a fC) t (adeue en) 


is convex and not empty. 
Proof. The set X is not empty, since it a priori contains the 
point a. For any points x, y «= Band: 20 <> eed als) mes 


satisfied. We assume in addition that x, y e X. Then 


f(Ax + (1-A)y) <  df(%) * (1-1) f(y) Ce) 
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By the arbitrariness of A of the interval [0,1] we conclude 
that the closed line segment joining two points x and y in X 
belongs to X%. Hence the set; X as convex. /// 
THEOREM 1.126. Let £5 (x) be a family of convex functions 
Tee? |i stn). "then for ‘any Os PoOye ie  Lisind the toOLlowing 
functions also are convex: 

m 


oie) = Dali), Gs) = max f(a), 


ve te[l:m] 


Gale emiaxe [0p te) 


te[l:m] 


BROOt me © CMe yinme ry, a yanS tee Os’ x < 1. “we have the inequalities 


m m 


Or (he + (LA) y) = Xo ceif, (he + (12) y) <a De iF (0) + 


t=1 
m 


+ (1A) & aif (y) =) (4) +(1—A) 1 (Y), 


2 (Ax + (1—A) y) =F; Ax +(1—A) y), 
where j is an integer of the set [1:m]. Hence 


Po (Ax-+ (12) y) SAF, (0) + (1) fy (Y) < 
<hop, (*) + (1 —A) @; (y) 

Mp Avs penn ieee OmC ON Vexclab VaRO x bo (x). We add the convex 
function faais* = 0 to the set of the functions f(x), and 
arrive at the convexity of bax). aay 
THEOREM 1.1.7. Let the function f(x) be convex and let $(z) 
be a monotone eran cenues TUMCUlOnm OZ, swe re eZ, sas 
scalar. Then the composite function o(f(x)) is convex in x. 
Proof. We take advantage of the inequality (1.1). The monotoni- 


City Oh mon IMnplTes that 
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GCL RTF CI=ARY JP Se GCALCR) + Cleat ty) 


By the convexity of % we have 


OCNECK) OFF CR=). ECD) Bey BL OCE(X).) F (lek) OCF Cy) 9 3 


which proves the convexity of the composite function. /// 


2. DIFFERENTIABILITY OF CONVEX FUNCTIONS 


1, DIRECTIONAL DIFFERENTIABILITY. 


THEOREM 1.2.1.. If f(t) is a convex function of the scalar argu- 


NCH Ce Ore Cmet ULNG lon 


p (t) = : OO) (to) 


f=1, 


MOND. Ss to is bounded from below and does not decrease as t 
increases. 


Proof. Let to < ty < to: Then 


hy = lo 


t,=t,+A(t,—t,), Oh =e wh 





The convexity property of f(t) implies 
F(t) =F fot (t,—to)) = f (Ata + (IA) f) < 
<M (£2) + (1 —A) f (to) = 
t—lo Ps 0 
=p tt) + (1— 23) Fb) 





vending inwecstiarn 


i rae as 0 
9 () = EEO) <=) _ 9 (ty, (2.1) 


ta —o 


i.e., $ is the monotone nondecreasing function of t for t>t 


(17) 1.2, DIFFERENTIABILITY OF CONVEX FUNCTIONS 


We show that it is bounded as t > +to. Let toe < to 


not. difficult. to obtain an inequality similar to (2.1): 


Lee Daa tes 


Yet (te ie fil 
o(t) =U yt ) — an ) — »(t), 


Thus, the function 9(t), monotonically decreasing as t + tty, 


is bounded from below and therefore the limit 


F(A) —F (to) 


lim oe 


rt 
exists, which is called the right derivative of the function f(t) 
at the point to: One can prove Similarly that the convex func- 
tion f(t) has a left derivative at the point to: off 

JHROREM P22. Lets the convex function). f(s) Tattain a finite 
value at a point x <« E”, Then for any direction qe«E", |lq|l=¥# 


BIeCreme <isStS maser vat LVen On st ne the sdirectTone scl: 


Of (x) _ 


aq lim 


t++0 





fF (x-+tq)—F(*) 

ane ate eras 

Proof. We consider the line segment joining the point x and 

vVo=- x + gq. Also, we define the function of the scalar varscumenta  t. 
WOt) = SC ty s+ (lat) x) pega t( ata) 


It follows from Theorem 1.1.4 that the function y(t) is convex 
ome gyal © < wu «< ah,  iehy MMore@nreynl Tbsp we ike aolale’ SENS WeMbUIs) Wee 45. 


VGtyeahassurient and Lleftecderivatives. “Therefore the limit exists: 





lim 
t> +0 


=v _ a) 
t eg? 


which was to be proved. /// 


In our theorem it is of no consequence that the vector of 
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the direction q be a unit vector; in fact any n-dimensional 

vector can play this role. In the sequel we shall write the deri- 

Vative of the tunction “f wateaspoint sain) the direction son. Fhe 
e 


n 
vector) qe as 


Dt f(x, q= jim ee 


++ 


where + sSignifies that the derivative is taken on the right. 


The derivative on the left can be defined similarly in the direc- 


tion Cine 
D> [(%; g).= lim pegs 
fc —() 
It is not’ hard to see that 
Zz ie 
D fl SCID) — =D) f(x: <q) 


Using the preceding theorems, we can show that at each point x 
at which the convex function f is finite, there exist left and 
right derivatives in the arbitrary direction “q, 


D*£(x,q) ae Wh A geal as 
2, PROPERTIES OF SUBGRADIENTS 


DEFINITION 1.2.1. Let the function f(x) be defined everywhere 


on E". To: each point x e« dom f we associate the set 


Of (x) ={ZzE E*: <z, y—x> <f (y) —f (x) Vy € E*}, (2.2) 
which we call the set of subgradients of the function fire best hie 


point x. A concrete element of df(x) is said to be a Ssubgra- 


dient, of the sunetionwe t (x) svat xe 
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THEOREM 1.2.3. Let the convex function f(x) attain a finite 
value at the point x. The vector Zz e«_ of(x) iff 


(ab) Si D*£(x,p) Gare) 


, 


LORMAN yeVeCctOr. sp Le He 
Proor. Let the subsradient | 2 satisty @.2)> assuming 


lea et etpDeewhese -of > On | In this case, for vany 7p 
t(z,p) < f(x+tp) - f(x) . (2.4) 


Disva damn pG2e 4s) ebyaeet, wes Oita ni 


Cai Nie hake 


Wes let] at go to zero and arrive thus at the required inequality . 
(Zens) ec 

Conversely, let (2.3) hold. It is not hard to show that for 
any vector, p the convex function, £(x).. satisfies 


Dae pint ete —Tey 


Hence, from this condition plus (2.3) we have that (2.4) holds 

for any p. Letting y = x + tp; we arrive at (2.2); which im- 
plies in turn that zis a subgradient of the function f at the 
joeRiMe 35° of JP 

THEOREM. 24 The setlon suberadients dtCx) sor the tconvex 


function f is convex and closed at each point x e« dom f. 


DrOO kee Lew Z4, Zg © OES). Lhenrtomiany elys < E" we have 
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(ay Ye kD Sel (Y) As (2.5) 
<Za, Y—*X> <T (y)—fF (x). (2.6) 
INP MNCL Rael slieven (C2SS)) Ihy MM, » (2G iy (Cle), wees © < A < il. 
and summing up, we obtain 
{24 + (1-A)Zo, y-x) See GY meat kD 


Since this inequality holds for any y é« aa 


AZ, + (1-2) Zo Be SECS) SG 


implying, in turn, the convexity of the set 

We prove next the closedness of df(x) 

the sequence Z, «© df(x), {Z,} Pe Zs 
exist y «¢ E” and e¢ > 0 such that 

(yo VaR = SEC) = ECs) 

For any element, Z, of the sequence {z,} 

Coa es i pa seit Se es 1 6 9) 


subtracting (2.7) from (2.8), we obtain 


E < (2,-2,, y-x) ~ |Z, 92 


ThisQimplies»thatytor: any «ki>),0 


al 


aap ae 


but this contradicts the convergence of the 
Hence we conclude that Z, € df(x) 


Ley 


and the 


is therefore closed. 


k 


we conclude that 
Crh, eet 5 


of subgradients. 


at the point x. Let 


¢ £(x). Then there 


(CBs W). 
the condition 


Cae» 














y-x|| 


sequence {Zz} tows 


set of subgradients 
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From this and the preceding theorems we have that if at x 
the set of subgradients of the function f(x) is not empty, then 


D*f(x, p)= max <2z, p>. 
“ z€ Of (x) 


, 


The notion of a subgradient is applicable in the case where 
the function f is defined only on some set X. 
DEFINITION 1.2.2. Let the function f(x) be defined on the set 
X the interior of which is not empty. For the point x « X we 
define the set 93f(x) which is the union of all vectors 2Z 6« Eo 


satisfying 
{z, oe) Med CV et (oe) 


for any y <« X.. The set of(x) is said to be the set of subgra- 
dients of the HTN Gite Dee eee UG Om Ost it amex : 
For a convex function the set of subgradients is nonempty, 


bounded, convex at any interior point of the domain of definition. 


3. PROPERTIES OF CONVEX DIFFERENTIABLE FUNCTIONS 


THEOREM 1.2.5. Let £ be a convex function defined on an open 


setlex cE’) “and let £ berdifferentiable at x < X'. Then 
(£08), x-X)\ Si (xye- 5) ee (2.9) 
Proof. Since x belongs to the open set X, there exists a 


neighborhood G(x) of the point x, such that G(x) belongs 
entirely to. xX. let «x ¢,X and x # x. We connect the points x 


and x by the straight line and put 


yet = xt PUCKEX)/S Oecd. . (2A 0) ) 
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If © <i ¢iefllxex|l,, then 
Ye exer, ee Gk) eax 

The convexity of f(x) on the convex set G(x) implies that 
£((1-A)x+ Ay) < (1-A)f(X) + AFCy) 


Now we make use of the differentiability of f at the point x 


(see Appendix I): 


S ~ F((I—A) e+ ay)—F (x) __ 
Pe ae 


=<f.(%), y—O +] y—xla(%, ¥(y—Z). ae 
Here the function a satisfies the condition 
Rim a(x; Mv=x ie <0 
A>0 
Passing ™*in’ (2.11)"to the limit’as’"*x +°0, *wetobtain 
(f(x), y-x\ Sn ey) eee (C22) 


The point y belongs to a convex set; hence, using (2.10) and 


(1.1), we obtain 
E(y) > £3) = f(a Cx-e ee f(x) <4 Lee) Pe) 


Substituting this expression in (2.12) and taking into account 
(2.10), we obtain the required result, (2.9). hat 

THEOREM 1.2.6. Let f bea differentiable function on an open 
convex set xX. In this case f is convex on X iff the inequal- 
Day, 


(f,.C%4), Xo — X1)\ = £(X5) - £( x4) C23) 
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is satisfied for any X14, Xo © one 
Proof. Necessity follows from the’ preceding theorem. We prove 
sufficiency. If X41, Xo © XK; #0 ‘at Ue leecher convexityooiaed 
implies that x, + \(x,-x,) © X. From, (2.13) we obtain 

N <P ig (%1 A (Xs — 1), ¥1—%2> < f (41) fF (41 +4 (4%.—%1)), 

—=(1—A) <f, (41-4 (%3—*%;)), X,— XD Ss 

<f (%.)—f (x: +4(%.—4;)) . 

Multiplying the first of these inequalities by (1-\’), the second 


by dA, and adding them, we obtain the inequality 
£0\K + (1-A) x1) < Af (x5) + (1-A) f(x) 


implying in turn the convexity.of the function, fsuc/// 
If we interchange X4 and Xo Pn CZ ao) hewerODLatnet hem hom 
? 


lowing formula: : 
(£04), Xo-X1) <! £( Xo - £(x4) = (£,.(%9), Xo -X1 ~ C214) 


THEOREM 1.2.7. Let £ be a differentiable function on an open 
econvexsses “XenGInvythistease, Of )Tism#convexionds X*eitd, the inequal= 
ity 

Oa (f,,(%) SECs) Xo — X14) C25) 
USMS Atel SC CeO tae Thy) X41, Xp € xX. 


Proof. Let f(x) be convex on X ands: let X11 X_ € Ke eBy the 


preceding Theorem the following inequalities hold: 


CPAK soa (%) —f (%); 
CPA(a): Mio, SS 1) Sf (X2). 


Adding these inequalities, we arrive at G2 Lone 
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We prove sufficiency. Let X14, X5 © X. Then we have 
X4 + A( Xp - X4) oo heetoOrgan yemO ars) i < 1. By the mean value theo- 


rem (see Appendix I) there exists }, 0O < X < 1 such that 
s 


£( Xo) - £( x4) = (f(xy + M(X5-%4)); Xo — X1\ ; (2,16) 
Bye Condit Mone C2. lo) wer have 
Ome (£ (x4 + Magee) =f oC) \(%q - X1)) ; 
yielding 
(£044), Xo — X1) & (£081 + (%5-%5)), Xo — X1) 
Taking into account (2216); wesobtain 
(f(x), Xo — X41) 3 £(Xp5 ) - f(x) 


implying, by the preceding Theorem, the convexity of the function 
Cyn f] 

The next two theorems will be proved in a similar manner. 
THEOREM 1.2.8. Let the function f(x), strictly convex on an 
open set X cE", be differentiable at a point x « X. Then the 
inequality 


(f(x), x - X) <" if (se £6) 


hodids “for ahy» xe<.X; xd xi: 

THEOREM 1.2.9. If the function f is differentiable on an open 
convex set X, then f is strictly convex on X iff the inequal- 
aetey 


(f.Cx4), Xo — X1) < £(X»5) - f(x) 
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holds for any X41, Xo © xe x4 # Xo: 
The assertions of these theorems are carried over to concave 


LUM Citlonc One Saosin s. ) = being replaced respectively by 2, >. 


4, PROPERTIES OF TWICE-DIFFERENTIABLE CONVEX FUNCTIONS 


THEOREM 1.2.10. Let f be the convex function defined on an open 
set Xoc ES and let f be twice differentiable at x « X. Then 


the matrix f 4 (®) is nonnegative definite, i.e., the inequality 


ay 
OO =s WV Py, CE) 


holds for any y & Rs 
Proof. For any y there exists an integer a 20 suche thateston 


any O< A< i the points x + Ay « X and, by Theorem 1.2.5, 
OF Pee Tey = C= Aff ,09), 9) 


On the other hand, noting the property of differentiability, 
we obtain 


f(x Ay)—F(2)—2 Fe), >= 
ee a Y Fee (x) ytMily PB (x, Ay), 


where lim 8@(x,Ay) = O. 
+0 


Hence 


ON a: SY fyy(*)¥ + |ly|P 6¢x,ay) 


he titans se eOn LON ZerO, We an Gave masven Ga eee) te 

THEOREM 1.2.11. If the function f is twice differentiable on an 
open convex set X, then f is convex on X iff the matrix 

f (x) is nonnegative definite on Ke 


pox 
Proof. Necessity follows from the preceding Theorem. Let us 
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prove sufficiency. If X41, Xp © xy sehen, by hay lot smtornmul a 


(see Appendix 1), the inequality 


f (%2) —f (%1) —<f, (41), %2—*> = 


e 
= aa)" fee (Sa +4 (%e— 14) (Hp — Hy) 
bolds for some )O04< 1)<, 1. . Thesrieshteside of this equality is 
nonnegative, hence the left side also is nonnegative and, by 
Theorem 1.2.6, the function .£ is convex on, xX... /// 
The next theorem is also simple to prove. 
THEOREM 1.2.12. Let the function f be twice differentiable on 
an open convex set xX. A sufficient condition for f to be 
strictly convex on X is that the matrix SeeN 9) be positive 


definite on X. 


5. PSEUDOCONVEX FUNCTIONS 


DEFINITION 1.2.3. Let the function f be defined on some open 


Set an o containing the set xX. The function f is said to 


be pseudoconvex at the point x with IISIONGE WO) wale Sey XC, alae 
it is differentiable at x and 
0 = (f(x), x- Xx) GC2melsp 
for all x= X implies that f(x) < £Cx). 
If the function f£ is defined everywhere on ee is dif- 


ferentiable at the point x, and the inequality (2.18) for all 
x ¢€ E” implies that f(x)? <“t(Gor -forlany? swe Say that the 
function f is pseudoconvex at the point. He 


Assume that the function f is convex on a convex set xX 


’ 
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is differentiable at a point x © X at which the function 2 t= 
tains the minimum on xX. Then the function f is pseudoconvex 

at the point xX with respect to the set X. Indeed, in this 

case the inequality (2.9) holds; hence the Payor, wien, (74> alts))) ale) 


satisfied implies automatically that £(x\e= £6x)eetorsany Txas 


3. NECESSARY AND SUFFICIENT CONDITIONS OF 


A LOCAL EXTREMUM OF FUNCTIONS OF MANY VARIABLES 


71. BASIC DEFINITIONS 


Let the function f be defined on some set Xc Eu We say that 
x, is a local minimum point of the function f on the set X 


d 


if there exists a neighborhood G(x,) of the point xX such 


*? 
that the inequality 


f(x Pa deaet =) (Sak) 


holds for all x belonging to the intersection of the set X and 


the set -GGx,). (If the strict inequality 
fay et < tx) (C82) 


holds for all x «= Xn G(x,), xX 7 X,, We Say that’ “xy is 2 
local isolated or a local strict minimum of the function f(x). 

If the inequality (3.1) holds for any xX © Sy eee 
to be a minimum or, more precisely, a global minimum of the func-— 
tion f on xX. The problem of finding the minimum of the func- 
tion f on the set X is written as 


man .t Cx). = (3.3) 
xeEX 
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We denote by 
X, jo= suArg min. £(x) 
xEX 
the set offall thetpoints cx, =7x) Satistyinge(ss1) ioreanyeg= 22% 
and call it the set of solutions to the problem (3.3). Usually, 
in minimization problems it is sufficient to find at least one 
joa > 


yet ay. 


In those cases where X coincides with the-entire space Eo: 


(3.3) is said to be the problem of unconstrained minimization of 
thesfunction f£, the point. x, ° satisfying (3.1) for any x « ee 
is said topbe. they clobaleminimum of ethe funietron LCs) lies. 1) 


OMS et Orer al eee GCx,); x, is said to be an unconstrained 


local minimum of the function f(x). When (@3.2) is satisfied for 
celal ex CAs), Seay, ene pone x, is said to be a local 


Lsotared oz a local strict minimum orethe tunccion f(x). 


2. NECESSARY AND SUFFICIENT CONDITIONS FOR A LOCAL MINIMUM 


DEFINITION 1.3.1. The point x, is said to be a stationary point 


on the dafferentiable function (x) sit 
te Cee) = O . E3245 


THEOREM 1.3.1 (Necessary Conditions for the Minimum). Let the 
local minimum of the function f on the set X be attained at 
the point x, ¢« int xX. Then: 


ol. Jif thes iunection et. 1s ditterentaabille a tame then x 


*? * 


is the stationary point jor the functions f: 


e2. if the function f is twice differentiable at the point 
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X,, the matrix tex (Sa? is nonnegative definite. 


Proof. iptOlwlovseeroms Deia Naatd Oe, Janta tO an sainpietica my, 
vector y « E” there exists ~A >40 ssuchithat for any:.0.< Ags 2 
the point x, + ry © G(x,) ¢ &. Using the differentiability of 


f, we write the condition (3.1) as 
O < f(xytAy) - fCx,) = Alf, (x), vy + Ally lloCxy, Av), (3-5) 


where the function oa is such that tim (xg, Ay) f=p0.4 Taking 
A> 


into account this property, we find from (3.5): 


lim ee =<f, (Xe), y>. 
A+0 


Hence (3.5) implies 


Q & (£084) y") 


To have this inequality satisfied for an arbitrary y, it is 
necessary that the condition (3.4) be satisfied. 
it the function f is twice differentiable at the point x, , 


instead of (3.5) we use the following formula: 


O<f (%+4y)—f (%0) = 


=! yp) ytMIYPB (x Ay), (28) 


where lim 8(x,,Ay) = O. 
+0 


Letting ) to zero, we obtain 


Eat Ay) HT (es) 


: J 
lim KO =a I Fax (xe) y= 9. 


A0 


By the arbitrariness of the vector y « E" the last formula 


implies that the matrix tg (Xx? is nonnegative definite, thus 
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comple tincmE he sp rO Ot. 69/9/14) 

THEOREM 1.3.2 (A Sufficient Condition for the Local Minimum). 
Letethesiunct vont defined on X be twice differentiable at 
the point’ x,° © int X,° and let’ the stationarity condition (3.24) 
be satisfied; also let the matrix Peete) be positive definite. 


Dien ys is an unconstrained local isolated minimum of the func- 


* 
ba @ Ties 
Proof. The positive definiteness of the matrix fox (Xa) implies 
WS Ghalseeveey oie (8 = 0) Shblor Gala yf ¢xy)y a 2c]|ly ||? Lor ad 
yeé E". We write Ax = x-xX, and, then, obtain from the defini- 
tion of the second derivative if we take into account (3.4) and 
(3.6) that 

F(x) —F (4) = 5 ANT fee (Xe) Ab] Ax|?B (x4, Ax) > 

2 (C+B (x4, Ax)) [Ax]? . 

The function 6(x,,Ax) is such that there exists a neighborhood 
GG) in) which °C + BiCssohx) >. 0, Torjall  x+re G(x,). Hence we 
arrive at the inequality (3.2) holding for any x-« G(x,), x # Xs 


which was to be proved. /// 
3. NECESSARY AND SUFFICIENT CONDITIONS FOR A LOCAL MAXIMUM 
It is possible to consider the problem of finding the maximum of 


the differentiable function f on the set xX: 


mci ent xa) ee (Catania) 
SX 
If the maximum in this problem is attained at the interior 
point x, Of therse tek ia nis necessary that the condition 


(3.4) be satisfied at this point; and, if the function f is 
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twice differentiable at X,, the matrix f (Se) has to be non- 
posteivie definate, that 1s, "for any y «© E" we have the inequal- 
ity 


on y 
y f (SX) re O 


A sufficient condition for the strict local maximum of the func- 
tion ac atv the point x, satisfying €3.4), is that the matrix 
fg (Se) be negative definite. 

Thus, if the local maximum or minimum of the differentiable 
function /-f" is*attained at the interior point “x, ~of therset | X, 
the-"condition (3.4) is"satisfied at this point. Hence: in solving 
extremal problems it. is sometimes useful to find stationary 
points. The problem of finding stationary points of the differen- 
tiable function f coincides with the problem of solving the 
system of equations i 2) = 0. Hence to solve the problems (3.3) 


and (3.7) one can use methods for solving the system of equations 


some of which will be given in Chapter 2. 


EeeSYLVESTER'S CRITERION 


Let A,Os), Ao(x), Acew, An 6) denote the sequential principal 
minors of the matrix Foye 

THEOREM 1.3.3 (Sylvester's Criterion). In order that the matrix 
TRICE) be positive definite, it is necessary and sufficient that 
the conditions 


Ay Cx) Sane Ag (x) Oe ter: Ay(®) > ORGS. 3) 


be satisfied. In order that the matrix ‘2 ae.9) be negative defi- 


nite, it is necessary and sufficient that the conditions 
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dns (RD Wena hs (3) (8 Ot eae. (Doh Ge) > 0 (3.9) 


be satisfied. 
The proof of the criterion can be found in any textbook on 
Linear Algebra. To each symmetric matrix ts) it is possible 


to associate the quadratic form 


(Gis) ee Zig toe Gaye , ze EU 


The inequalities (3.8) and (3.9) yield necessary and suffi- 
cient conditions for the quadratic form (x) to be positive de- 
finite or negative definite. In textbooks on mathematical analy- 
sis (see, for instance, G.M. Fikhtengol'ts [1]), only those cases 
are usually considered where the conditions (3.8) and (3.9) are 
satisfied; all of other possible canes are "indefinite" since they 
provide no sufficient conditions for the extremum of the functions. 
Later on, in Section 5, we shall show that the cases where in (3.8) 
and (3.9) the inverse strict inequalities hold correspond to the 


sufficient conditions of the local maximin and minimax. 


5. CONVEX FUNCTIONS 


THEOREM 1.3.4. Let the function f be convex on EP. Then each 


local minimum of the function f is at the same time global mini- 
mum on ae the set of points of the minimum of the function f 
is convex. If the function f is strictly convex, xX, consists 
Of Fay Sinolespoun te 


Proot. Let themecondition (3.1) be satictiederor all) xe GGrr): 


We take an arbitrary point y é« E": then for sufficiently small 
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OF = Jee weshave Ke it ACYees ee GCs). sUsingsthe convexity .ot 


fi. owe obtain 


ESE pees eal Ch ease), Ta PEC). * (led) tix) 


which implies that f£(x,) < f(y). By the arbitrariness of y <€ a 


we conclude that x, is a point of the global minimum of the 
SAU, Gel © Ll 0// 

Letting c = £(xX,),(«we see that,. by, Theorem 1.1.5, the set 
of minima 

xen = {xeE" : fh) Sto) 

1S CONVEX. 

Assume that the minimum of strictly convex function f is 
attained at two distinet points X41, Xo eX. Then the convexity 
of X, and the strict convexity of f imply that 


ec = £CAX, 4B (1-A) x5) < Af (x4) a (1-A)£( x5) =" °C 


for any 0 < } < 1. ‘This contradiction implies in turn the uni- 
queness of the minimum point. /// 
THEOREM 1.3.5. Let f be a convex function on EH. The condi-— 
tion 0 © 9f(x,) holds iff f attains its global minimum on E" 
at the point x,. 
Proof. From the definition of the set of subgradients it follows 
tat O)¢ o¢(x,) iff ct CE) eet Cae) for any xX « EY. =6Then we in- 
fer that x, is the global Ma Wa MV etl XO) 

Ib hem rune TOnws Las CONVCZCTO nN E” and differentiable at 


the=point =x,," thet f(xy) ="O. iff "exp is the point of-the 


global minimum of the function f. ae), 
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4, NECESSARY AND SUFFICIENT CONDITIONS 


FOR A MINIMUM OF FUNCTIONS ON SETS 


1. BASIC DEFINITIONS 


We assume that the function f is defined on some open set con- 


taining the set X. We also assume that the set 


A Sarge min f(x) 
xeX 


is not empty. At those points x ¢« X where the EAC Ones 


differentiable it is possible to determine the point-set mapping 


WCX) =" -Ave-min Yi o(x ey —x é (4.1) 
min (f, 


This implies that W(x) ¢ X. Pie ches funetacon a eta mel SiG itt er 
entiable everywhere on xX, the condition (4.1) defines the set 
W(x) ¢ X for each point x « xX, Thus, W is the point-set 
mapping of the set X onto itself. The fixed points of the map- 


ping W(x) are such that 
: x e WCE) e 5. (4.2) 
Ti Seow (x), there exists a point WY SUS SuCA wat 
(f(x), yx) <2 
ix = Wes) swe have the inequality 
Os (i,(x), Y-x) Viyeu Xue (4.3) 


the latter being called sometimes "variational inequality"; each 


point x <« X at which (4.3) is satisfied is said to be the solu= 
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tion of the variational inequality. For x, tobe a’ fixed) point 
of the mapping W, it is necessary and sufficient that it be a 
solution of the variational inequality (4.3). 


At each fixed point x « Re where f(x) 7.0; the ise 


Ktzy sal ¢aien? t {i (a) py=a5 4 0} 


is a hyperplane passing through the point x, with normal f(x). 


At the same time, K(x) is the tangent plane to the hypersurface 
R(x)! <sitly ae Pist(y) StGe)} 


ate thie: point * x. "The set P(e) as: thetlevel set “of the function 
f(x), the gradient f(x) is directed along the normal to the 


level set, inward the set 
WiC = fy ely JRity) > 2(x)) 
The hyperplane K(x) generates two half-spaces: 
Oars. (£08), y-X), (fx), y-x) Sh e0 


Let the point “x « Ke"be such that GSs)Fisesatisiiedsaterhis 
point. Then the former half-space contains the set X. Hence 
(4.3) can be interpreted as the condition for the set X to be 
contained entirely in one of the two closed half-spaces defined 
by the tangent hyperplane K(x)f “By "Definition 1. Ivily “ini this 
situation the tangent hyperplane K(S)@ldse va support tortherset- XxX 
at the point x, the vector EaCx) iswarsuppony vToOmtnessets x 


at the point <x. 
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2. FIRST ORDER CONDITIONS FOR A MINIMUM 


Theorem 1.4.1. Let the function f be defined on an open set 
in Ee containing a convex*set xX, and let there be a point 

xX € X, at which the function f is differentiable. Then it is 
necessary that x be a fixed point of the mapping W(x). 

Proof. Let x be an arbitrary point of the set X. By Defini- 
itaony lay, whe iconvexi ty. of sbhemseu |x implies that for any 
A= [0,1] we have the inclusion x + \(x-x) =X. Since £ is 


differentiable at the point x and x « X,, we have 
<a) cht Ge sie 
=h<fe(X), X—X> +A x—x]la(x, A (x—x)), 
where 
Limvotx, “Cx ny ee 20 
+0 
then we obtain thatefor any? te LORI 
Os tee Xx) + lhe or (xy. AC x=) ) 
Letting A to zero, we arrive at the inequality 
Ors. (f(x), X-X) (4.4) 


holding for any x « X, which implies in turn that x « WOK) oof fi 
This theorem provides a necessary condition for the minimum 
OL CHneer une talonwte tO be globalvon the setyjiXeeeAs wil be shown 
later (insSection 6:4), this necessary condition is closely con- 
nected with a discrete version of Pontryagin's maximum principle 


in optimal control theory. 
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The requirement for the set X to be convex, introduced in 
the Theorem, is essential. It is easy to give examples to show 
that a violation of this requirement makes the assertion of the 
Theorem wrong. Nevertheless, a local version of the Theorem is 
possible which does not require the set X to be convex. The 
inequality (4.3) will hold then only for the vectors y which 
belong to the intersection of X and some neighborhood of X,. 
THEOREM 1.4.2. Let the function f be defined on an open set in 
eo containing X, let the set X, be nonempty and convex, and 
let there exist a point x « X, at which the Lune ton eh aes 
differentiable. Then there exists a neighborhood G(X,) of the 


set X such that the inequality (4.4) holds for all 


*? 
x <= Xm GCKs).. 

Definition 1.2.3 of the pseudoconvexity implies the follow- 
ing theorem providing sufficient conditions for the global minimum 
of the function ~f£ onthe set” xX: 

THEOREM 1.4.3. Let the function f be pseudoconvex at a point 


xX with respect to the set X, and let x be a fixed point of 


the mapping W. Then x « X,. 


3. THE CASE OF CONVEX FUNCTIONS 


We assume first that the function f(x) is defined and convex on 
an open set containing the set X, there exists a point xX, « x 
and the function f is differentiable at the points xe Xx; =rhen 


from (2.14) we have the inequality 
Ore Sea fi Decttie « (f(x), X-Xy) ; (4S) 


which holds for all x belonging to AG SEB > WoWe be G 2, 
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x € W(x) we have 


(f.09), X-x\ < (£04), x4 — x) 


Hence (4.5) can be expressed in terms 


Oe GG) Ck) me (f(x), x-X\ Vox € XUS Vx = Wx). 
(4.6) 
We shall be using this inequality in the sequel. 


Let x <€'X, xe W(x). "Then 
Ofte = Cx): x-x\ Vx © W(x) 


and for any y « X (4.3) is satisfied. From the inequality 
(4.6) we infer that in this case xe xo Eor any) x 2 oe 


x € X, by (4.5), we have 
(£08), Ky — X) 0 


The inequality (4.3) is therefore violated for Yea exe. 

We arrive at the following result: 

In order that the function f, convex and differentiable at 
the point x « X, to attain at x the minimum on the convex set 
X, it is necessary ana sufficient that the condition (4.3) hold 
(that is, the point x has to be a fixed point of the mapping 


W(x) or the gradient f(x) has to be a support vector of the 


Sele Craumc ho Pointe x). 


4, SECOND ORDER NECESSARY CONDITION FOR A MINIMUM 
It may happen that the condition for a Stationary point 


Ta(X)s Fay 0 (4.7) 
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le satisfiediat the pointers «ixX") Ii’ the function (+f «is convex, 
thisvimplies that’ x «ix¢y Yin’the general’ case the property (457) 
makes the condition (422) eaten meaningless. Hence one needs to 
introduce necessary conditions of higher orders. 

THEOREM 1.4.4 (Second-Order Necessary Condition for a Minimum). 
Let the function f be defined on an open set of Bee containing 
a convex Set X, and let the function f be twice differentiable 
at the point xe«X 


x at Wile (4en (a) ees rs elie Sibalse Gime are nec mela) 


hold for any x © X. 
#f 
Proof. Let x be an arbitrary point of the set xX. It follows 


from the convexity of X that the inclusion 
x + \(x-x) « X 


holds for any A « [0,1]. Using the condition that the function 


f is twice differentiable at a point x « X, we obtain 


O<f+h(x—x)) —FQ)= | tip oe F 
=Mé (Kam) fen l*) (aa) a (4—x FBX, A (x—x)). 


Here lim 8(x, \(x-x)) = 0. Letting . go to zero, we arrive at 
A70 
the inequality (4.8). ~//7/ 


As before, one can define the point-set mapping 


X, (x) =e Are Bao Gen fy s*) (x-x) 


at the stationary point Ss we lo, findethe set XO), one needs in 
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this case to solve the problem of minimization of the quadratic 
formPon,the.set =X. I1t.is.possible -to:posesa_ problem of finding 
fixed points of this mapping or a problem of solving the quadratic 


variational inequalities 
— xx ** oa ? 


defining thereby the points satisfying the second-order condition 


for a minimum. 


5. PROPERTIES OF MINIMAX PROBLEMS 


71. THE STATEMENT OF MINIMAX PROBLEMS 


n 


Let F(x,y) be a continuous function defined for all x <€ E and 


m 


Vans First, we pose the problems of finding the unconstrained 


maximin and minimax 


V = i i ’ ’ 
ee (x, y) (854) 
V,=min max F(x, y). (5e2)) 


xEeEntyeEmn 


Using the second problem as an example, we explain the meaning of 
the expression used. For each fixed x <« EF” we find ficsieerhe 


image of the point-set mapping B(x): 


BCS )us = Arg max, FGcwou, Cons) 
m 
y ek 


Next we define the function of the maximum 
DCX eee ks (CX) oe (5.4) 


where F(x, B(x)) signifies that the value F(x,y) with y © B(x) 
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is being calculated. it) follows; from the definition €5.3)) that 
F(x,y) attains the same value for any y €« B(x), which justifies 
this form of notation. 

Next we define the set 


oo ew ane mine oF Cx} 


n 
xe 


Finding X, and the point-set mapping B(x) is said to be 
the synthesis of the problem (5.2); at least one pair of points 
[Xa,¥e)5, where x, = Xie y, obs) Ss (said to be»the (global) 
solution of the minimax problem (5.2). The quantity Vo aE Cae Ve) 
is) Said to be the minimax estimate. 

By an interior problem we mean the determination of at least 
one of the elements of the set B(x) for each x. By an exter- 
ior problem we mean the determination of at least one point of X,. 

To make our discussion simpler, we assume in this section 
that all the operations of seeking extrema have solutions. In 
(5.2) this means, that the,multivalued mapping B is defined for 
any, x, therset ~X, being nonempty. 

THEOREM 1.5.1. If the problems (5.1) and (5.2) have solutions, 
then 


V Sa Me. he (5.5) 


al 


Proof. For any fixed y = fo the inequality 


DinesrGsy)ms VECX,y) 


n 
xek 


is@satistied. Similarly, “for cach £ixedy 7x 
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iE (G74) Ma Se CX) 
m 
yck 


Hence for any y « E” and x e E 


min F (x, y)<max F(x, y). 


xeEn yee 


By virtue of the arbitrariness of x and y we obtain 


max minF (x, y)<minmax F(x, y). 
yEeE"xEeEn YER eye eM 


2. CONSTRAINED MINIMAX PROBLEMS 


in the problems (5.1) and (5.2) no constraints are imposed on x 
and y; hence we may say these are problems of seeking an uncon- 
strained maximin and minimax. We consider the case where x and 


Veuane eresiti1 ct edmpiye: 


V,=maxminF (x, y), (5.6) 
yEeY xEeX 

V,=minmax F(x, y). (5.7) 
xENX yey 


Solutions to these problems can be found in the same way as was 


done above (see (5.3) and (5.4)): 


B(x)=ArgmaxF (x, y), 9(%)=F(x, B(x), 


X,=Arg min (x). aia 


xEX 


For applications, one question is of great importance: which pro- 
perties of the function F(x,y) change and which properties re- 
main intact after the operation of maximization is performed over 
y, i.e., upon the transition of F(x,y) to $(x)? To answer 
this question we consider first properties of cContanudiy convex— 


Liv and tp se hike Zmeomusmausuinys 
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To shorten the notation, let 


Zea, YY ee es 2 
Lama wre iors Ye 
EC ye = Ce) 


We say that the function F(z) Satisties a Lipschitz condition 
on 4 with a constant &, the function 6(x) satisfies a Lip- 


sehitz condition on X with a constant 2%, if the conditions 


| F (2,)—F (z,)| <!]/2z1—2,|, 

| P (%1)—@ (%2)| < Lx; | or 
Z4, 25 © Z and X41, Xp © xX. 
THEOREM 1.5.2. [1 the function, (x,y) is continuous on the dix 


are respectively satisfied for any 


rect product of compact sets Z =X x Y, then the function 90(x) 
Ten Cconvuinvous aswell. on Xe athessunctdon, CZ) Sasatact lesan: 


Lipschitz condition on 2Z, so does the function (x), with the 


Samenconstant.. lt torn eachw. ye Ya the function | hGx,y)) 1s .con— 
vex in x on the convex set X, (x) as well is convex on X. 
Proof. The function F(x,y), being continuous on the compact set 
Z. is unatormiy continuous on thissset.. Hence Tor any « > U0 
there exists 4(c¢) such that if lx - x5 Il < CGeor Xj, X> = X, 
then 


|F(x,,y) TG F(X%5,y) | < € 
for any y « Y. We consider the expression 
| o( x4) = (X, ) | ie |F(x,,¥;) a F(X5,Yo) | ’ 


where yes B(x,), vee B( Xo). We consider two cases. In case 


(44) 1. AN INTRODUCTION TO OPTIMIZATION THEORY 


one we have 


ON ge A Oa) 
Taking into account the defimition of the set B(x), we obtain 
Oe OCG) Os ay ee F(X%5,Y9) < € 


If the function F(z) satisfies a Lipschitz condition in z, it 
satisfies as well a Lipschitz condition in x uniformly in Vans 
hence 

OS ee) Hx is, | 
In case two 


Oe Se @CS5) = 00x) 
Similarly we obtain that 
UT Se. CAG Te OC ie kono) PCR GV a) © es 
and if F satisfies a Lipschitz condition, then 
OS ICG OC TP SR on ace | 
Combining these two cases, we conclude that the inequality 
1O( 4 n= OCXa lias 


holds for any x,, X59 ¢ X such that eee an ts (es), dies, cthe 
function $(x) is continuous on the compact eset Xt. It. hea tice 
fies a Lipschitz condition, then (5.9) holds. 


If F is convex in x, then we have the estimates 
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@ (At, + (1 —2) 44) = max F (Ax, + (1—A) Xs, YX 
ye 
Te [AF (x1, y)+(1—A) F(t, y)J< 
ine < Ap (x1) + (1 —A) — (%2) 


for any X1, Xo & X, O< 4 < 1, which proves the convexity of 
che function o.- /// 

The property of differentiability of the function F is not 
preserved in passing to the function 94. We have the following 
theorem. 


THEOREM 1.5.3 (Danskin-Dem'yanov). Let X be an open set in Ey 


OTe De ma COMnpactese tae. m ue Veter heer inl CON Gael umee ING. 


BCX; y) be continuous on the set X x Y. Then the function 


¢(x) defined by (5.8) has at each point x e X a derivative in, 


Any GirecceLlon | = 1 ae 





Og (x 
ut a maxe< h(x, Y)8>. 
& ye B(x) 
The proof of this Theorem can be found in Danskin [1], and 
also in Dem'yanov and Malozemov [1]. In the latter the following 


theorem is proved which is a generalization of Theorem 1.4.1 to 


the case of minimization of a maximum function. 
n 


THEOREMeda om ae Letee Dom Dena weClOSedTCOnVes SOU ne sha sletaesYe De 
a compact set in gy and let the function F be continuous to- 


gether with Nae) on the set Xx Yo. Then ant order thaw the 
funetion (x) attain 1 minimum on ©X 9 at a ‘point x ce X, it 
is necessary and, in the case of convexity of (x) on X, also 
sufficient that 


inf max PRX) pay =e50) 
xeX Pa = } 


« 
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To conclude, we give a theorem on existence of solutions of 
thes problems (5.6) and eComn). 
THEOREM 1.5.5. If the function F(x,y) is continuous on the pro- 
duct of compact sets X re Y, then the solutions of the pro- 


blems (5.6) and (5.7) exist and the inequality 


V,=max min Fi, y) = V = minimax F(Z, y) (5.10) 
yeY xEX xEX yEeY : 


holds. 
The proof of this Theorem follows almost word-for-word the 


proof of Theorem 1.5.1. 


3. SADDLE POINTS 


We say that the point Z.°= (xX. Vy). 16-8 saddle point or the 
LUNCtHON (ay) ithe prop lems (5.6) en Cie Ome) a ae: Kg Se 


y, <= Y and the inequalities 
RCN) ant ECxS, Yop SOR xeyy (55.11) 


NOM CEO Ima 1, see ae nn Vie ei Vae 
LEMMA 1.5.1. If the problems (5.6) and (5.7) have solutions, then 


in order that the equality 


Nee ECR) ee Car12) 


hold, it is necessary and sufficient that Z, be the saddle point 
Chi Aslne@y Ae DhaYeiy ayo NCS WW )). 
Proof. Let ([x,,y,] be a saddle point of the function F(x,y). 


Then 
max F (%,, 9) <F (Xe, ¥.) mini, 9). 
yey xEX 
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The following inequalities are obvious: 


min max F (x, y) << max F (x., y), 
xeX yey yey 


min F (F G2) = maxmin Fy -y): 


xEeX yYEY %EX 
Combining the last three groups of inequalities, we obtain that 


minimax i (i yy) =F (ti48,)somax min Fe (sy). 
xEX yEY yeY xExX 


We compare this inequality with (5.10) and conclude that the 
equality (5.12) holds. 
Let (5.12) hold. We show that the point [Xj,¥ gu ts saddle: 


From (5.12) there follows 
F (Xq5) Us) = Minintax F(t, yy) ak aite ee T (Xe; Y), 
xEX yey 
eot0 


FAX eUle— 12k Mint (x,y) — nin F (cee) (Haug) 


yEeY xEx 


Combining these inequalities, we obtain that (5.11) holds, that 
is, [x,,Yy,] is the saddle point of the function F(x,y). /// 
DEFINITION 1.5.1. We say that the function F(x,y) is convex- 
concave mate h(x, yom) is convex an x Sfor alld) fixed ~y = Y and 
ICR )) aS} CN@ENME) Sil Af ashe ByIl skalSsteyel be Gi OK, Mes) aiibliaKeriy akfoysl 
F(x,y) is said to be strictly convex-concave, if for any fixed 
ay alte lS) Srey @@mbhizerc Wal ox, Lyelel Sitope ehowye aialpeyel oe alts) Sineaeamilhy 
concave in y. 
THEOREM 1.5.6 (Von Neumann). Let the convex-concave function 
F(x,y) be continuous on the direct product of convex compact sets 
X and Y. Then the function F(x,y) has a saddle point. 

The proof of this theorem can be found, for instance, in 


Neumann and Morgenstern [1], and in Davydov [2]. 
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4, THE NOTION OF LOCAL SOLUTIONS 


If solving an interior or an exterior problem is treated as that 
On InGdinemamlocaL extrema, jwe arrive at the notion of local sol- 
utions of a minimax or a maximin problem. Such solutions are 
obtained if, for example, we attempt to solve the problem (5.2) 
numerically, using the methods of local unconstrained minimization 
and maximization. 
DEFINITION 1.5.2. Let the points xX, and y, have respectively 
the neighborhoods G(x, ) and Gly, ) such that for any vectors 
x € G(x,) the condition 

1 ee Ne Se nv) 

yeG(y,,) 

defines the single-valued cunetion g(x), x, being the point of 
the strict minimum of the function F(x, g(x)) on G(x,) and 
Vee= g(x,). Then we say that ig oe [X, Vy] San Sita cte local 


minimax point of the problem (5.2). 


et Zz, is a strict local minimax point;! the strict inequal- 
ities 

FUx 3 6CKGOT ran TS ea D? & (5.13) 

ECAR ee Xe x) ) (5.14) 


are Satisfied for any x « GCxP) Mxey x Sen yee GGy 0a AVF eG 
The inequality (5.14) is satisfied as well for x = yo) YT xe 
In a Similar manner one can define solutions of the problems 
(5.1) ‘and (5.2994) local, or? global <n Roowlocalgormelobakiiney y : 
strict or non-strict in x; or strict or non-strict in Ve. We The 


definitions given of local solutions are a generalization of well- 
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known notions of local extrema. The minimax problem (5.2) may 
have several local solutions. However, after they have been found, 
it is not possible to assert.that there is a Miopal solution a- 
mong those local ones: one needs to consider all global solutions 
ine, all local solutdons “in x, .andschoose among them the 
solution for which the value of F(x,y) is minimal. ~ To study 
and find local solutions is crucial for the class of functions 
defined below, in which the points of the local minimax are at 
the same time points of the global minimax. For the maximin pro- 
blems (5.1), local solutions are to be defined in a similar way. 
DEFINITION 1.5.3. Let the points x, and Ty aphaves respectively 
neighborhoods G(x,) and G(y,) such that for any vectors 
y =G(y, J. «the condition 

Gy) A oem nex ya) 

xeG(x, ) 

defines the single-valued function d(y), y, being the strict 
maximum point of the function F(d(y), y) on Gly) and 
GGyeje=x,. Then 2, = [X,.¥y] 428 Said to be a strict local 
Naximins point OL the problem (5.1). Dhe inequalities (Ca) 3)), 


(5.14) are to be repllaced-im this ‘case’ by 


RCACyYOR y) War BEG CE). Byaot ©; 
(5.15) 


FCOCY Guy) =< ee EGx yy) 


DERINI TION 15.4... che function  F(x,y)) has astrict wocal saddile 
at the point wy , if. 4,  ~isSsat the same time a strict local 


minimax and maximin point. 
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Combining the inequalities (5.13) - (5.15), we obtain that at 
the saddle point the inequalities 


ECA Von ee ha, Ve eee Cn eee 
(5.16) 


ECAH 5 Wa 2X) Four ves RE eee, 
are: satisiied, where ‘vy «1G(y,), yx yy, & = Gal. x7 x2. 


5. NECESSARY AND SUFFICIENT CONDITIONS FOR A LOCAL MINIMAX 


Introduce the square symmetric matrix of dimension ne 
(x,y) = F (x,y) - FL. (x,y) Fol(x,y) F.(x,y) 
? xX d Xy , yy d yx ? 


THEOREM 1.5.7. het; the function F(z) be twice* continuously 


differentiable in some neighborhood G(z,) of the point z In 


or 
order that in the problem (5.2) the vector Z, be a strict local 


minimax point, it is sufficient that the conditions 


oe a, enor 
B42») me Ont 2 
Corea) 
vo <a sae 
d02)) cet 0 


be satisfied. If in the) problem (5.2) the’ vector Z, 1S a strict 


local minimax point, it is necessary that oe = Om? 


re) = On Boys ee Os and if Fes? <00) when o(z,) = 0; 
Proof. To prove sufficiency, one needs to analyze the inequali- 


GLESR Co. S scans Coxb4)) = ikbertollows eircom ee = Onl? 


< @ that) y, is a strict local maximum point of 
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the function F(x, ;y)) tine y. Siace the matrix aA) is nega- 
tive definite, there exists some neighborhood G(z,) =G(x,) x G(y,.) 
Gf the point — 25% where the matrix Bye ce also is negative 
definite and for all x «= G(x,) the implicit equation 

Fy‘ %:9? = 0 defines uniquely the continuously differentiable 
funceiou sy =-e(s)o yy = e€xh,)- This funetion can be obtained as 


a solution of the Cauchy problem for the following system of qua- 


Silinear equations 


dg a 
ane Sg gt) Sere g(x)) 0 (5.18) 
with n independent variables ita ee x] and the initial con- 


ditions e( x.) c="). 

We denote by g(x) the solution to the system (5.18), de- 
fined on G(x,) and assuming a value of G(y,). The values of 
the vector function y = g(x) are local maximum points of the 
function F(x,y) in y. Thus, the inequality (5.14) holds. 
Next, substituting 'y © e{x). into. F(x,y), we obtain the maxi— 
mum function 


pCa des Chex yeg(x)) 
Differentiating 4 noting (5.18), we obtain 


dF (x, 
0, (x) = LEE) — F(x, g(x), 


do\T 
Per) = Feels BOD) + Pay (x (0) (2) = 
=F. .(%, g(x))—Fay(%, @(x)) Fag (%, &(*)) Pye (% (x) = 
=O(x, g(x)). 
In particular, for x = x,.. we; have C(x Vyas nence the matrix 


d, (x,) is positive definite and x, is a strict local minimum 


XX 
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point of the function oC), sand) Conl3)) holds.) PAnaLlovousscon— 
Siderations yield the necessary conditions for a local minimax 
point formulated in the Theotem. Vasey, 

Sufficient conditions of the Theorem can be stated as suffi- 


Gieiiy COME Kos skoie ine Eiloljail SoltwuaKoin wo CaeA NS ita ills case, 
more rigid requirements ought to be imposed on F(x,y). Arguing 
that for each x ¢« E™ the interior problem ain (5.2) has a solu— 
tion and the exterior problem is solvable, we formulate the fol- 
lowing sufficient condition for global minimax. 
THEOREM 1.5.8. Let the function F(x,y) for each fixed x « ae 
be strictly concave in y and let the function F(x, B(x)) be 
strictly convex in x. Then the point of local minimax in (5.2) 
is at the same time the global solution of the problem (5.2). 
Necessary and sufficient conditions for a maximin in the pro- 
bitema@S. ye wil be sami ar. 
THEOREM 1.5.9. Let the function F(z) be twice continuously dif- 


ferentiable in a neighborhood of the point z iba) @wCie BLO we 


ake * 
to be a strict local maximin point of the problem Coven) ra eeelatmeeS 


sufficient that the conditions 


Fz (zs) =0, Fy (2s)=0, Fag (2e)> 0, 
N (2s) ae ee (Zs) ES (Ze) Fey (2) <i (Zn) > 0 


be satisfied. 
6. CRITERION FOR MINIMAX DEFINITENESS 


it the function F(x,y) is differentiable everywhere in x and 
y, the conditions of Theorem 1.5.8 can be written in analytical 


form for any xy and Xo of EW x4 F Xo and V4» Vo of 
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m ioe 
1 vey # Yo: it is necessary that the inequalities 


<P Adis 2 (41)), Se > < F (Xs, 2 %s)) —F (2X1, 2 (2%), 
SEP My Us) suit ae ei re ole (My iatig) eg ( X50) 


be satisfied, where 


g (x1) = Arg max F(x, y), g(%.) =Arg max F (x, y). 
yeE™ yee” 


If the function F(x,y) is differentiable and strictly 
convex-concave, the conditions (5.19) will a priori be satisfied. 
Indeed, for such functions the second inequality holds and the 
first inequality follows from 


RP (Xt, 8G), Xos tar 5 Xe, BU Wa Ua 2a) = 
Sek A (Xo, Y)—F (%1, 8 (%1))=F (Xe, & (X2)) —F (41, g (%1)). 


Lf sthe function “—FGty)* is twice continuously differentiable, 
a sufficient condition is: Hare) (ye Bo and’ ’y ¢ E™ the 
conditions 


Foy (%+¥) <iy SO (RY OCXS) 20%) eee eG 


are satisfied. The latter condition will a priori be satisfied, 
if a stronger condition holds, the verification of which does not 
require the knowledge of the function g(x): for any x « isa 
yer 

Fyy (=v) cannes o(x7 yop t ze 9 
are satisfied. 


We write these conditions in a different form and, to do this, 


introduce the square symmetric matrix 


_. [ Pyy (2) 1 Fyx (2) 
Re role oe 
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where the matrix R(z) is partitioned into four matrix-squares. 
We denote the sequential principal minors of the matrix R(z) by 
A, Cz), A,(2)5 22x, Avam(2)+ In particular, Ae) [Fy y (2) |, 
Antn = IR(z) |. 
Now we prove the following 
CRITERION. In order that the matrices 46(z), er be posi- 
tive definite, it is necessary and sufficient that the inequali- 
ties 
(—1)'A;(z) >0, i€[1:m], (520) 
G44 (2) 0, Lehleal, (5.21) 


hold for the principal minors of the matrix R(z). 
The equivalence of the inequalities (5.20) with the negative 
definiteness of the matrix Foy (2) follows from Sylvester's cri- 


terion. In order to prove (5.21), we express the principal minors 


sey Ol mune wMat hex (7 ))iee Dn tbermmsmott 
Nina (2) =| S02 Le) 

Be) Fx jy (2) | Pax, (2) j (5522) 
where Pox denotes a mxj-dimensional matrix and Le denotes 
J 2 2 Jog 

a jXj-matrix. Furthermore, a and +. lie on the 
ony @ax dx OX 
intersection of the qe row and eo column of these matrices, 


respectively. Next, we multiply the upper row of the block matrix 








(5.22) on the left by the jxm-dimensional matrix Bigs and 
a 
subtract the latter from the lower row. The determinant Ants 
does not change, as we know; then we obtain 
Ang) = lFy ee xy yea y() Fy eee (5.23) 


(55) 1.5. PROPERTIES OF MINIMAX PROBLEMS 


Let ae: j= lim], denote the principal minors’ of the 
Nagin ano CZ) en ECOMm Como) sist tOoldlowse phat Ce) are connected 


with the principal minors of. the matrix R(z) through 


7 


cad) ents (2 euery Cad) mon 6,(2) 


mt+j m 


Liuse Lt thesanequalitaes (Co. 20) sand (oe 21) shold, the principal 


minors of the matrix 6(zZ) are positive and, therefore, the 


matrix (z) is positive definite. The converse is also true, 
that is, the positive definiteness of 6(z) and a ee) imply 
themcondisthonsm (5.20) mandiaCon2is)neum/, /e/ 

One particular case is of interest: when y is scalar 


Cil=))-eathemcondiltionss (5.20) mandm(on2)) awl beminesthismease: 
A, (2) << ae ae feleeryctl| 


According to the results of Section 3, in order that the station- 
ary point z of the function F(z) be an unconstrained local 
minimum, it is sufficient that the principal minors of the matrix 


RGZmber strictly sposatawe: 
A, (2) Sama ie [1:nt+1] 


In order that the stationary point az be an unconstrained 
locale maxamum of the tunetion Cz), sit ws sutticient that the 


conditions 


zi 


(=e A(z) perOny, do Sj [on+1} 


be satisfied. The remaining, fourth version of the conditions 


GeO bys Seva [lined 
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corresponds to the sufficient maximin conditions in the problem 


max muiiny ShiCoxy) 
n il 
xeE yek 
The case considered demonstrates a close connection between 
the criterion proved and Sylvester's criterion. We can elaborate 
this connection even more, introducing notions similar to those of 


positive as well as of negative definiteness. For instance, the 


quadratic form 
a a T a 
2- RZ) 2 = x Ff 2x) * Tara yi Px ce ® an AY Boy ey 


can be called minimax definite at the point Ze it for any 


x #0, y # O we have the inequalities 


Ge \Ze) Yy = 0 — 
a oe [x7 < (20) X + 2x7F,, (Ze) y+ yTF,, (24) y]: 


The maximum of the right side of the inequality in y is attained 
_ ol eee 3 : : 

for y= Be oe) he a Hence this inequality is equivalent 

to the inequality 


Y P(x y < 0 < x7 6(z4)x 


The criterion proved above provides necessary and sufficient 
conditions for minimax definiteness of the quadratic form. The 


function F(z) can be called strictly minimax if for all Ze 


m the quadratic form a) Rae is minimax definite. For 


rene 
these functions the conditions of Theorem 1.5.8 are satisfied, 
and their local minimax solution is simultaneously global. 


The class of functions thus introduced can be equated with 


the well-known classes of convex and convex-concave functions. 
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Indeed, if the local minimum exists in strictly convex functions, 
then it as unique and it coincides with the global minimum. If 
the local saddle exists'in convex-concave functions, then it is 
unique and it coincides with the global saddle. If the local min- 
imax exists in strictly minimax functions, then it is unique and 
it is the global minimax. The class introduced is richer than the 
class of strictly convex-concave functions (in which y PF (z)y <a Oe 
x (2) x > 0), because the former class contains the latter. It 
is important to have introduced this class of functions because 
the problems (5.2) can be solved via iterative local numerical 
methods. 

We consider next two simple examples to illustrate the proper- 
ties obtained of minimax problems. We assume that x and y are 
scalar. 

EXAMPLE 1. Let 

RES yy) = ase sin 27(x-y) 
By the periodicity of the function in y, we can restrict ourselves 
to an interval 0O< y < 1. The stationary points are x = OF 


y= 0.25 and x= 0, gyi0.75e) The. first pain is awsolutionsof 


the problem (5.2), in which Vo = ih, (Webs (Cacilteh)) Joeensy nelave) akelaen 


= x - 
Ae ft: “+ = cot 21(y-x) 


The straight line passing through the first point is given by 


in ee. 25) ey, (5.24) 


It is not difficult to show that the point thus found is the 


global solution of a minimax problem, and the point x = 0 and 
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the function (5.24) yields in the problem a (global) synthesis. 
the second point x = 0,0 y = 0) foe Gisea, local and, "simul cancous ly, 
global solution of the maximtn problem (5.1), in which Va = -1. 

No point among those found is saddle. 


EXAMPLE 2. For F(x,y) we take the quadratic form: 


-x? + Qkxy - y? |, ge ee ya gee tS ft 


In the problem of finding the minimax, the global solution is 
given by x= y = 0, Vo = 0. Eq. (5.18) has a solution given by 
the linear function y = kx. The maximin in the given problem is 


not attainable on a bounded set. 


7. A PARTICULAR CASE 
We consider here the simplest version of a minimax problem in 
which we seek 


THEO eS) ee o(x) = max 2) : S259) 


xeR? Te Deey] 


where {£4(x«)} is“altinite set of: functions. y Theorem W518. of 
Danskin and Dem'yanov reads here as follows: 

THEOREM 1.5.10. Let etfe functions tan ver (Lacie eabe 
continuously differentiable in some neighborhood of the point x. 
Then the function of the maximum 4(x) is differentiable at x 


in any direction gq, ||q||= 1, with 


“EO _ max <fi(x), @, 
fl i€B (x) 


B(x) ={iE[lic]: p(x) =f! (x)}. 


If all the functions ft. are continuously differentiable, the 
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necessary condition for the minimum in the problem (5.25) is the 

following: 

THEOREM 1.5.11. In order that x, be the point of the minimum of 
n 


the function $(x) on E’, it is necessary, and in the case 


where (x) is convex, is also sufficient that the inequality 


inf max Maye =) 
Ilgii=1 fe B(x.) <fe (4), 9> (C5, BS) 


be satisfied, or, which is the same, 


inf 99 (%e) + 9, C727) 
gli=1 99 


The points x, satisfying (5.26), (5.27), are said to be 
stationary points of the function of the maximum (x). Let 
Lo(x) denote the convex hull spanned by the vectors aces 


where ad ¢ B(x); 


eB te B (x) 


Lo(x) ={2€ E: z= Dy ibs (2): G0 ae re \. 


THEOREM 1.5.12. In order that the function $(x) attain the min- 
imum at the point x,, it is necessary, and in the case where 


o(x) 1S convex is Sufficient, that 
O « Lo (xy) 


This condition is a generalization of the condition (3.4) to 


maximum functions. 
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6. CONDITIONS FOR A MINIMUM IN NONLINEAR 
PROGRAMMING PROBLEMS WITHOUT DIFFERENTIABILITY 


I, “BASIC NOTIONS a 


The general problem of nonlinear programming consists in finding 


more Ge) (Bis) 
xeX 


where the constraint set X is defined by the following condition: 
X Oe {xen 9 7e() =0,5nGay SOF (6.2) 


Here we have introduced two vector-function mappings 


xc fh (exes eS RS 


? 


We shall say that the function f(x), the vector functions 
g and h are the functions defining the problem (6.1). The vec- 
tor function eg gives constraints of equality type and h_ con- 
straints of inequality type. We denote by, the settof=aia 
solutions to the problem (Gag ik yp xy is called the solution set. 


Now we define the set 


Kye «iden eoetx) 0, HG 0) | (6.3) 


Next we introduce the so-called Lagrange function or Simply 
Lagrangian: 


L(x,u,v) = £(x) + {u,g(x)) + (v,n(x)) . (6.4) 


© and v « E® are called Lagrange multi- 


The vectors ue E 
pliers or dual vectors. 


DEFINITION 1.6.1... Any point, x =X is called 4 heasibler poamt. 
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DEFINITION 1.6.2. At the point [xe vielt gre the complemen- 


tarity condition is satisfied if 
yond (x Net > 0) j ¢ : 
se ae : Bie mel sul ar. CGE Op) 


DEFINITION 1.6.3. Ate the point [x,,v,] ¢ Ea the strict compli- 


mentarity condition is satisfied if the condition (6.5) holds and 
Ge Beasy 3) G Ians up elaes Co havelbiesioya hd (x,) = 0 implies that 

vi > 0. 

DEFINITION 1.6.4. The problem (6.1) is called a convex program- 
ming problem if the functions f and d are convex and the vec-— 
re. @ Nicemeta TG 1k @ Tm XS) ed Se cd s1 11) © eel TX 

DEFINITION 1.6.5. The problem (6.1) is called a linear program- 
ming problem if the functions defining the problem are affine in 
Kos 

DEFINITION 1.6.6. In the problem (6.1) Slater's constraint quali- 
fiuCcaELOnMm (CO) EeLsmsa tilet Ted ets uhemser Xo is not empty. 
DEFINITION 1.6.7. In the problem (6.1) Karlin's constraint quali- 


fication (CQ) is satisfied if there exist no vectors ué a 


ve EG, |lvll #0, for which the inequality 
o < (u, g(x)) + (v, nCx)) (6.6) 


LS satasiied for any x € no 
If Slater's CQ is satisfied, Karlin's CQ holds. Indeed, at 
least for xX é¢ X we then have: ets) = Oy) hGx)me< 0, and for 


any ue«E°, ve« ES llv|| # O the inequality 


(a Meta eaeaye h(Sy <0, 
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being the converse of the required (6.6). 


In Section 5 we shall show that Karlin's CQ is equivalent 


to Slater's CQ in convex programming problems. 


2. SUFFICIENT CONDITIONS FOR A MINIMUM 


Here we formulate and prove theorems providing sufficient condi- 
tions for a minimum in the problem (6.1). Consider two auxiliary 


problems of finding the maximin and minimax of the Lagrange func- 


Eaton: 
Vi sup. supe inl Lye. 3): (6.7) 
weE® ye pe xEEn 
Von sup. supal (re We 2). (6.8) 


xEEt we Ef ve ES 


We say that the point [x,,u,,v,] « ge where. vy, 2.0) 


m =e + c,,;is.a saddle point of the funetion..L, if for any 


ies Mo ares. yc Ey the inequalities 
Leys gv) ICX, ey VE) oe L(x, Uy Ve ) (6.9) 


are satisfied. 

THEOREM 1.6.1. Let [x,,u,,v,] be a saddle point of the 
Lagrangian L. Then the point x, is a solution of the problem 
CGE) and: [x,5Vy] satisfies the complementarity condition. 
Proof. It follows from the left side of (6.9) that for all 


Ue Eo? Vv ¢€ Es the condition 
e c 
= g! (x,) [ui — ub] sek h! (Xx) [vi —v!] <0 (6.10) 


LSMS ats le dae lama Wes wie st uy LOD alt) Aen tee except 


k : : 
only one component ué =u,+1. Substituting these vectors into 
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(6.10), we obtain eX (x,) < 0. We may put ué = us - 1; and then 
we arrive at the condition eX (x,) 20. Only one possibility re- 
mains: eX(x,) = 0. By'the arbitrariness of k we come to the 
conclusion that g(x,) = 0. Taking in (6.10) Use we vd = vd 


= [ives except vo sly? + 1, we obtain that 


Oca cil Lens 
h’ (x,) SSO) byethesarbitraniness Ofeess trom a 42c wer inte. 
that h(x.) = OF Thus,*the point x, < X,ssthateis feasible: 


0 


i} 


heteus pute anesCcGe10)= Wu = u Vv ; then we obtain 


Oem An x), Ver (6.11) 


Atethe samestime, .,h(x,).< 0 and,,0.<.v,, -~hence for ,all 
J © [lse¢] the inequality hn (x, )vd = ONES eSat sited: 

Comparing the last inequality with (6.11), we infer that (6/5) 
holds true. From the right inequality of (6.9), taking into ac- 


COUN GORiO) paaWe, 0 Dita et hate tOrea lex E" the condition 


EG See) oe (uy, 8(x)) u (vy »h(x)) 


isssaticefied. In, particular, for any “xi\e5xX" ‘we havet «f(x t(x)- 
Them we inferithat. x,\= Xy. ins 

If we assume that in (6.7) the interior problem is solvable, 
we can introduce into consideration the function of dual vectors, 


letting 


y(u,v) = FinTa te er (CULV) 


xeE" 
Then the exterior problem for (6.7) consists in finding 


V =" Sup, sup 7. (4, 2). 


ue Ee c 
vee. 
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THEOREM 1.6.2. Let xX, #9; then we have the estimates 


Va se eevee 5 oo eee (62) 


Proof. First we consider problem (6.8): the interior problem 
therein is solvable in a simple way: 


B(x) = sup sup L(x, u, v)= 


ue te c 
vee’ 


PAS) i eee ie 
ie if x€X., 


Solving the exterior problem, we come to the conclusion that the 
minimum of the function (x) is attained on the feasible set and 


coincides with the minimum of the function TCS) One xc hence 


fA X,) = mshi (Oo) Vo 
xeEX 
ioe Enoki oe ts ue, UP GS Hey V € zy we have the inequality 


TGC). Peek yD 


Next we minimize the left and the right sides of the last inequal- 
Wty alia se OCS AMY TelelShal Yovlonee hala 


ECO eat (Xe = mings (x)" 2 mint iC x5 pues yee inf Ley. Vee 
xeX xeX xe 


By the arbitrariness of u « coe Ve ET, we arrive at the left 


Sa demotenCGrnt2p)e 


m 


THEOREM 1.6.3. Let the point [x,,u,,v,] « E°*™ in which x, «xX, 


* 


v, 2 0, be the solution of the’ problem (6.7), and let the comple- 


mentarity condition be satisfied. Then xX, € X,. 


Proor..) Lor a = Uy voor uNempadn t xX, must be the solution 


of the interior problem in (6.7), i.e., the inequality 
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\ 
f (Xe) + <a, 2 (Xe)> + <0, A (Xe)> = F (Xe) S 
< f(x) +<ts, £(X)>+- <6, 1 (%)> 


is satisfied for any x‘e EB", If x ¢« X, this inequality im- 


plies that 


SEXE) OMS PLC Rule Ve Set Ce), 


whence we infer that x, = X,. /// 


3. COMPUTATIONAL ASPECTS 


The above Theorems suggest possible methods for constructing num- 
erical algorithms for solving the problem (6.1). From Theorem 
1.6.1 it follows that when there exists a saddle point of the 
Lagrangian L it is possible to seek the saddle points of L -in- 
stead of solving the problem (6.1). We shall consider these meth-— 
ods in Section 4.1. From Theorem 1.6.2 we infer that solving the 
minimax problem (6.8) yields a solution of the problem (6.1). 
This approach is quite feasible if we are Cemiacenme Dea cmEbniCiiem er 
ists a bounded solution [x,u,v] of the minimax problem. Let 

-i 


all u | <r eC) ey <7t} then we replace the minimax problem 


(6.8) by the problem 


P,=min max max L(x, u, v). 


xeEE® [ull ct o<ol <t C6513) 
Solving the interior problem, we obtain 
e i c \ 
pamin{fay+e[ Sell > hi, 9]. ‘6 ais 
xeEn t=! j=! 


Thus, having introduced only one hypothesis concerning the exis- 


tence of bounded solutions to (6.8), we reduced the initial problem 


(6.1) to that of finding the unconstrained minimum of some auxi- 
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liary function of x...In Chapter 3 we shall obtain the same re-— 
sult while studying the method of penalty functions. 
Theorem 1.6.3 enables us to seek solutions of the maximin 
e 
problem (6.7). If its solution is/given by the feasible point 
(x, <« X), the complementarity condition is satisfied; then 


os is the solution of the initial problem. 


* 


We shall use this property in Chapter 4. 


4, MODIFIED LAGRANGIANS 


All the sufficient conditions given in this section are based on 
considering auxiliary problems related to the Lagrange function L. 
Then the following questions arise: 

Is it possible to use auxiliary functions of a more general 
type than the Lagrangian L in order to obtain sufficient condi- 
tions for the minimum and, which is more essential, to construct 
numerical methods for solving the problem (6.1)? Will this tech- 
nique allow us to guarantee the solvability of auxiliary problems 
for a wider class of nonlinear programming problems? To which 
other auxiliary problems, convenient for numerical realization, 
can we reduce the problem (6.1)? 

Currently, Patensiee research is going on, and it will be 
possible, apparently, to answer these questions in the future. 
Theoretical investigations and numerical experiments indicate 
that it is useful to consider new auxiliary functions usually 
known as modified or generalized Lagrange functions and to intro- 
duce thereby new auxiliary problems different from those of find- 


ing saddle points, maximin, or minimax. We cite an unconventional 
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sufficient condition for the minimum in the problem (6.1), which 


uses a special auxiliary function of the form: 
CaS Yy YO =) ECx) ee 6x ying. (6.15) 


Here y is a vector whose dimension is not essential at the moment 
Gn wparticulary = y "may be scalar), ~& 9 is a continuous funetion 
of the arguments. Let 
x(y) =PeAre min Prat ye. (6.16) 
n 
xe 
Assuming that for a vector y the set x(y) is not empty, 
we proceed to the problem of finding real solutions of the follow- 


ing system: 


Il 
°o 


g(x) 


lA 
o 


h(x) where x = x(y)) |. CORLL) 


The connection between this auxiliary problem and the initial 
problem (6.1) is more fully unraveled by the following. 


THEOREM 1.6.4 (Yu. G. Evtushenko [12]). Suppose there exist vec- 
tors, y,4 and jx, satisfying (6.17) and that the function € in 


(6.15) is such that for any x « X we have the inequality 


ECR we teen Uk ro a) mes (6,18) 
Then: 
@17 Xo 2S xGy Gre oD) 
Ome It BCX, Yu) 1S CGMS sO ENP Se GO  awlaleynn 
Ss eM x(y,) : 622.09) 


Proof. Suppose there exist xX,, Vy, Santis ny ome (CO 17) ee Loe n 
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Rg ie Ki) x(y,) Ca ete Sl @ ice eet © TaN cy) aS 
Ei Xy5Vg) 2 9 HOx, you oe (6.21) 


We.use. (6,18). Brom (6,21) we obtain oC x) = 1¢x) ,and, taking 
IMtOMACeCOUNt  thatbw tx. olSean arbitrary feasible point, we conclude 
that x, « Xy. 


Let yx) ea" xX HO Wee Ce) oe met (xen) i(Se) and. x! ex. 


*? 


One more time using (6.18), we obtain 
BOA 9) eee hg eee CK LM) eC ar) 


Linthespoint [xX,,¥, ]+osatisfies (6.17), then the inequality 
HCG, yy s HCA;y¥G). tholdserersalie x < E” and in Dian tee wil anee hor 


x'. Therefore, from the two inequalities 
aye) = NK ye) =) Ge sy) 


we find that BOX s5 Vad = CRE Ve) x! Se x(y,). We have proved 
CORALS eR 


To prove (6.20) it suffices to show that 


ee ano (6.22) 


since the inverse pei netca has been proved. Let x' e x(y¥ sen XxX, 
Then H(x,;yy) = H(x',y,) hence the function Cle, YS) SteGcon— 
stant on X; hence £Cxg) = 2( 29 or tthus Proviney( 6.22)... 9 fy} 
Theorem 1.6.4 enables one to reduce the initial problem of 
nonlinear programming to that of solving a system of nonlinear 
equalities and inequalities; various numerical methods following 


from our result here will be described in Section 4.2. 
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5. SOME AUXILIARY RESULTS 


LEMMA 1.6.1. Let (6.1) be a convex programming problem and let 
the set Xo defined by (6.3) be empty. Then there exist vectors 
ue a ve E (u,v) # 0, such that the inequality (6.6) holds 
Lon any x ¢€ a 


Proof. We define two sets 


A(x)={2, y: ZEB’, ye RE’, z=gi(x), y>h(x)}, 
Ne Ec) 


xeEn 


If the points [24,71], [Z5,Vo] belong) to Ay then, for, 


0 SaaS 1 we have 


(1A) 2, -+ Az, = (1 —A) g (1) +8 (%2) = g ((L—A) x1 + Ax), 

(1—A) yt Aya > (1 —A) A (xy) + Mt (x) A ((L—A) x, + Ax,). 
Hence the set A is convex. From the condition of the lemma 
Xo = 9 it follows that the origin z= 0, y = 0 does not be- 
long to the set A. By the Theorem on separability 1.1.1, there 
Cieimle EE vie 8°, Mlu.vj7 0, such that fromthe condicvion 


ag iol) Goi wey lage 


OA ta pou ay + (v,b ie (6.23) 


We can make the components of the vector b as large as possible, 


Hetec=thes vector Vi7, J. In (6.23) we put a = g(x), 
beaunG e+ cd, dp <.B,Ofid||= 1.) 2°6%0 them [a by a ntx)eer 


for any xX « E" we have 
Ours {u, e(x)) + (v, h(x) ) + e(v,d) 


Because e¢ is an arbitrary positive number, we infer that (6.6) 
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is satisfied: /// 

LEMMA 1.6.2. For convex programming problems Slater's CQ is equi- 
valent to: Karlin's CQ; 

Proof. With Definition 1.6.7, we pointed out that when Slater's 
CQ is satisfied Karlin's CQ holds. We shall show that if Slater's 
CQ is not satisfied, Karlin's CQ does not hold. By the previous 
lemma the condition Xo = 9 implies that there exist ué« Bes 

ve Bee (u,v) # 0 such that the inequality (6.6) is satisfied. 
If l|ilvll#0, Karlin's CQ is violated. We suppose the opposite, 
that is, v= 0. The affine vector function g(x) is representa-— 
ble as g(x) = Ax + b, where A is the exn-matrix. We can as- 
sume without loss of generality that the rows of the matrix A 

are linearly independent, since otherwise the number of restric-— 


tions could be reduced without changing the feasible set. Let us 


write (6.6) in the form 


OPS Chee bs = oa + pb, up , WY 5 & ele llul| # O 


(6.24) 


We show that (6.24) implies atu = 0. Otherwise, for x we take 


—ATu ite a0, a> <0. 
Ges 4 , b> AT. : 
— eat if <b, u> >0. 


Substituting these expressions into the right side: of (6.24), we 


obtain that if <b, u) <= 0, = then 


xtaty + <b, u) = Pee Abas + (pb, u) < or < O 


Testi (b, up > 0, then x Ay <F <b, u) = Oe a) < 0. But these rela- 


tions contradict the left-hand inequality in (6.24). Hence 
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Aly = 0, which contradicts the linear independence of the rows 


of the matrix A. Therefore, ||v|| #0, and Karlin's CQ is not 


Saute site lee Clem 9/9) 


7 


6. NECESSARY AND SUFFICIENT CONDITIONS FOR A MINIMUM 


The three theorems given in Section 2 permit us to replace the 

Si niGlad eproplenn (GO. my eGhateot ctindainoe saddle spotntsmomet le 
Lagrangian, or by that of solving a minimax or a maximin problem. 
Unfortunately, such reduction of (6.1) to simpler problems is not 
a universal technique: not in all problems the Lagrangian has 
finite saddle points and the conditions of Theorem 1.6.3 are 
Satisfied. We cite here some results providing sufficient condi- 
tions for the existence of bounded dual vectors. In one of the ¢ 
first works on nonlinear programming, F. John [1] suggested for 


the analysis of the problem (6.1) that the function 
L°(x,u,v,q) = af(x) + (u, g(x)) + 4v, h(x)) (6.25) 


should be used. 
n+m+1 


We say that the point [x,,u,,V,,d,1,¢ 5 in which 
qd, € oe uy € oo Vy 2 E., (Un Vaod,. 7, 0, GS a saddle 
point of the function to calciaets @ ee 1) Ve ae Use mee Vale Ey 
we have the inequalities 
O O 
L(x, U,V, dy) Sel ae Mo ese ec ae bn Ua Vi Oa) 
(Ca, ZS) 


Comparing these inequalities with (6.9), we infer that if 
: 0 
[X,.U,ysV,.d,)] isa Saddle point of the function bh and 4, 70, 


then the totality [x4 ae | is the saddle point of the func- 
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CLOnwe ae sCOnMe FS ellen meet [X,,U,,V,] is a saddle: point»of the 
function lL, then [x,,u,,v,,1] is a saddle point, of the func- 
tion Tee % 

“The next theorem is an analog of Theorem 1.6.1 for the func- 
tion Te 
THEOREM 1.6.5. Let [Xgo¥gsVardal De a saudve poine of tie 
function pe (Gayo Vyo 7 Ose, INEM @Xy Gan, = (0,0) HOLS. and 
either, gq, > UF. xX, = A,, or q, = 0 “and furthermore in the pro- 
blem (6.1) Karlin's and Slater"s-€Q do not hold. 
Proof. In the same way as in justifying Theorem 1.6.1, we can 
Show here that x, « X and (6.5) holds. If O<qy,, then 


Myo Sage if “dq, = 07 then “ivi ¢ 0; Irom the righ: inequality 


* 


of (6.26) we obtain 
o < (uy, g(x)) + (vy, b(x)) Your ae 


indicating that Karlin's and Slater's CQ are violated. /// 
THEOREM 1.6.6 (Uzawa-Karlin Saddle Point Theorem). Suppose in the 
convex programming problem (6.1) the set of solutions in not empty. 


Then for each x, «= X 


xk there exists a saddle point [Ay steps Vyo dad 


* 
of the function De, and the complementarity condition is satis-— 
fied at the point [x,,V,]. 


Proof. The fact that x, « X, implies that the system 
BAe Te = 20), g(x) = 0, btx)s"0 (6.27) 


has no solution. Let us use Lemma 1.6.1. Since the statement 
"h(x) < 0 has no solution'' implies the statement SOG). Oma s 


no solution," the assertion of Lemma 1.6.1 can be formulated as 
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follows: if the system (6.27) has no solution, there exist 
c 
uy, € us. Van Ey; Gy, € a not equal to zero at the same time, 


such that for all x « E” ‘the inequality 
O < a,(#(x)-f(x,)) + (uy, e(x)) + (vy, BCx)) (6.28) 


is Satisfied. Putting here x = X,, we obtain that 
(vy, h(x,)) 20) Se buasisa nice Vieni Oand) WCx) se0" rl we linter 
that (6.5) holds. It follows from (6.28) that for all x « ED 


oe Cr, Uy, Ux, Gy) = 
== af (Xe) =f <a, 2 (Le) > 460g; 1 (Xe) = Lo (ee Vee ae), 


which proves the right inequality in (6.26). 
From the condition h(x,) < O we have: (v,h(x,)) <— 0) al@ie 


Vi O MNO tino CGnonPEWwenobilad nm pha teckoranyen elle ES and v« ES 


uf (Xe) + <U, B(X%e)>+<0, A (Xe)> < 
< ef (Xe) Ue & (Xe)> Fer A (Xe)>, 

i.e., the left inequality.in (6.26) holds. /// 
THEOREM 1.6.7 (Kuhn-Tucker Saddle-Point Theorem). Suppose that 
in the convex programming problem (6.1) the set of solutions X, 
is not empty and either Slater's CQ or Karlin's CQ is satisfied. 
Then) ‘for-each “wx ,¢ Xp. ‘there exist vectors ué HS Ve Ey such 
that [x,, u, Vv] is a saddle point of the Lagrangian L. 
Proof. By the previous theorem the saddle point [x,,u,,v,,q4,] of 
the function re exists, the condition (6.5) is satisfied at the 
points  fagivg)re IP “a, > 0, then X y we: oh is the saddle 
point of ‘the function L. 


We prove that q, > 0.. Assume the converse, that is, q, = 0. 
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Then (uy, vy) # 0a) and from the right ‘side of((6<26)/ 170 d@ollows 


thatetorn any =i ebhe -anequali ty 
OSes (u,,eCx)? ; oe h(x)) 


is(satisfied, which is impossible for |iv, ||? 0, vy, 20, since 
it contradicts Karlkin's! and sStaten's €Q.. Arcuing in thessame way 


as in proving Lemma 1.6.2, we can show that |lv,|| #0. Hence 


a, #0, u=2*, ve yy 


ne} 
cay 


7, GENERALIZATIONS 


The results cited above may be generalized to the case where in 
place of (6.1) we consider the problem of finding 
min £(x) , (6, 295) 
xeXnU 
where U is a set in ee whose interior is not empty, the set 
x us) defined by the conditron 1(622) 4) Lhewset ror solutwonsi-ot the 
problem (6.29) is denoted as before by X,, the Lagrangian is 
defined ‘by the sformulas(6.4).: In thetproblems: (G27: (6.8) and 
(6.16) the condition “x ~=1U wis used instead of the condition 
X € ED. The pontine (6.9), (6.26) defining the saddle points 
as well as srbemimequalityeC6. 6) mustmholditoreany —x<wesU. 0) bacon 
vex programming problems the set U is required to be convex. 

It is easy to see that the theorems and lemmas of this sec- 
tron Swill stillehold=ror = therprobilem (6629) if we takewginto ac= 
count the additional assertions made. In Theorems 1.6.6 and 
1.76.7, .and Lemmas 1leGyl-cand-1.642,.<it was required thatetherset 


U be convex. The representation of a nonlinear programming pro- 
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blem in the form (6.29) will be encountered frequently; it is use- 
ful when for various reasons the operation of minimizing the func- 
tions on the set U ig easily realizable.” For example, U is of 
a Simple structure, or the minimum on u can be sought through 
analytical formulas. The problem (6.29) can be more convenient 

for theoretical investigations as well, since, imposing special 
restrictions on U (for example, requiring the set U be compact), 
we essentially simplify the problem and analysis of the methods 


Lore Solyvimie. at. 


7. CONDITIONS FOR A MINIMUM IN NONLINEAR 


PROGRAMMING PROBLEMS WITH DIFFERENTIABILITY 


1. BASIC DEFINITIONS AND PRELIMINARY RESULTS 


We consider the problem (6.1), using the Lagrange function (6.4). 


DEFINITION 1.7.1. Let the functions defining the problem (6.1) be 


: + 
differentiable at a point x,; then the point [x,,u,,v,] «© Boas 


+ 
isesaidto besa Kubn-Tucker point if x oe XX. at [xX. Vv.) e Eo 
the complementarity condition (6.5) is satisfied, v, 2 0 and 

Li (Xo Uy, Vx) = f(x) + S.(Xy)Uy f hl (xy) V5 = 0 
(Ceesak)) 


DEFINITION 1.7.2. Let the functions defining the problem (6.1) 

be differentiable at the point x,. Then we say that the 
[Xuu Uys Var del. < gutmet ieya JON point, ity ox, sok, at 

[Xp5Vne die € ee the complementarity condition (6.5) is satisfied, 
> 0, with only some of u,, vy, 4, equal to zero, and 


qe 30, eV, 


0 : 
Pag) = 0 (Tak) 
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Tt xe, UCN yl) eo Ee Johny point ands ay =) 15° phen 
[x4oUz.Vg] isi a Kubn-Tucker point. 

Next we introduce two igdex sets: 

o(xy= {7 € [1 sclua (a) 0}, 
O(x, v=tpel[tic]:. j Eo (x),-0.> 0}. Gis) 

The functions gi (x), hi (x) in which et (x) = hY (x) = 0, 
are said to be active constraints at the point x. The index set 
o(x) ‘defines, thus, the totality of active constraints of ‘the 
inequality-type at the point x. 
DEEN TMION ln woe Les cons train t Smee) m= Onn iC x) ‘Si O satisfy 
Thesconsuraintu dua bitieation ate ties pOlnuemee ce Xo ese Tem vec Lox, 
functions g(x), h(x) are differentiable at x and the vectors 
g(x), hI (x), det (esl eae e o(x), are linearly independent. 
DEFINITION 1.7.4. The constraints g(x) = 0 and h(x) < O satis- 
fy the Arrow-Hurwicz-Uzawa CQ at the point x e X if the vector 
LuNneCthTON NCS) 1S ditrerentsablesat | x.) &Cx)) iS sconbtinuousiy 


Citrecent table wa tex, the WviecLrors g(x) ft Oe) ewe (elo rena te 


n 


linearly independent, and there exists a vector ZeHE satisfy- 
ing the relations 
gi(x)z = JO, 
(7.4) 
\APBBNE Si oe Vj -< o(x) 


In some cases, one needs to impose special constraints on 
g(x) and h(x) at the points not belonging to the feasible set. 
This occurs in the following definition. 
DEFINITION 1.7.5. The constraints g(x) = 0 and h(x) < 0 


satisfy the strengthened CQ if the vector functions g(x) and h(x) 
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are everywhere differentiable; at each point x « X the constraint 
qualification is satisfied and at each point x e X there exists 


a vector zg « E” satisfying the relations 


x 


ies g.(x)z a Oey h(x) + hi (x)z < 0 


In considering convex programming problems, we can assume 
without loss of generality that the condition g(x) = 0 defines 
the (n-e)-dimensional linear manifold, since, if this condition 
were not valid, it would have been possible to reduce the number 
of constraints of the equality type removing depending conditions; 
the feasible set X does not change in this situation. Hence 
in the sequel we will assume that in convex programming problems 
the rank of the matrix g(x) is maximal and equal to” e. 

For convex programming problems the Arrow-Hurwicz-Uzawa CQ 
will be satisfied, if the vector function h(x) is differentiable 
and Slater's CQ holds. Indeed, let g(x) = 0, h(x) < 0; then, 
using the linearity of g(x) and convexity of h(x), we obtain 


g (x) =g () at (x) (e—*) =0, 


= is = eee) 
¢hi (x), x —x> <h/ (x) —A/ (x) = hf (x) <0 VWfEo(x). ( ) 


Letting z= x—x, we arrive at (7.4) 
LEMMA 1.7.1 (Motzkin's Lemma on Transposition). Let the matrices 
A and B- be respectively of dimensions axd, oda One enO Pee eT 


either the system 


or the system 
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b 
<E, , Ilzoll# 0 


has a solution. These two systems cannot have solutions at the 
same time. The proof can be found in Kuhn and Tucker [2]. 

LEMMA 1.7.2. Let the functions g, h be differentiable at the 
point x <« X at which the Arrow-Hurwicz-Uzawa CQ is satisfied, 
and in a convex programming problem Slater's CQ or Karlin's CQ is 
Sieigskeit eel, ANekee GiGya eWay bl Bee Vee Ee such that all Te vs, 
i <« [l:e], j ¢€ o(x), not equal to zero at the same time, we have 


the conditions 


AE RT. 
Ee ae (7.6) 


Proof. If o(x) = 9, the required inequality follows from the 
linear independence of the vectors gh (x), 1 epiige leu elt ther set 
o(x) is not empty, we take advantage of the previous Lemma, tak- 
inet Or Ame Lhe smatra x g(x) and; for the rows of the matrix 8B 
we take the gradients hJ (x), Jeo) pet hesystemmG@.14 mas 
solvable, hence the inequality (7.6) holds for any u, v satisfy- 
ing the condition of the Lemma. For the convex programming pro- 


blem the conditions (7.5) are used, which follow from Slater's CQ. 


Eee 


2, NECESSARY AND SUFFICIENT CONDITIONS FOR 


A MINIMUM OF CONVEX PROGRAMMING PROBLEMS 


For convex programming problems the differentiability condition in 


Definitions 1.7.1. and 1.7.2 can be omitted, and instead of 
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(7.1) and (7.2) we could require the following: let there exist 


vectors 


a ; 
Cer eaiat Cowie MOP Vee fe 9h (x) 0, Nate Lc) ae 


7 


such that respectively 


2° Be (Xe) lab ds 2k =0, (7.7) 
j=l 

Get 27 (Ka) ey 2 = OF (7.8) 
j=l 


THEOREM 1.7.1. Let in the convex programming problem (6.1) the 
solution set X, be not empty. Then: 

@i. each Kuhn-Tucker point [x,,u,,v,] is a saddle point of 
the Lagrange function L(x,u,v), KX, e X,; 

@2. each F. John point) ([x;pUg3¥479g! is.a saddlespoint of 
the function Te Gturvi a): for; qginso0 awebhaveerxtee Xye 

@e3. if Slater's CQ or Karlin's CQ is satisfied, then there 
exist’ Kubn=Tuckerspoints tor! F.yJohnopoints) “fx,,U,5V,,9,)1!) ag 200% 
Proof. From the convexity of TG) pehG»)ewands thes linearis yor 


g(x) it follows that for arbitrary x « Ea 


f(x) Sa (ea) <2 x SX. (7.9) 
hi (x) Sh (x) + <2, x — XD, ©7516) 
& (X) == & (Xa) + By (Xe) (X— Xx). CTatt) 


We multiply (7.10) by vi cyayel Gein Wi) Owiere 3), fedbillivatyoilyy (Ce. abal)) 
scalarwise by uy,, and add the inequalities obtained to (7.9). 
Noting that v, 2 0 and taking into account the definition of 


the function L and the condition (7.7), we obtain 
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L (KS ty, Ue) 22g, an Vel ey ea 


an oi (X«) g++ >) 2/0h> an Lhe, Ux, Ux) . 
j=l 


Thus, we have proved the right inequality in (6.9). The left 


inequality has in this case the form 


c 


> vhs (x4) < Dd vith? (x,). 


f= {=9 


By the constraint qualification the right side of the last formula 
is zero; and since x, « X we obtain that Cv, h(x,)) Ss 0) ahore eiay 
v > 0. Hence the inequalities (6.9) are satisfied, and, using 
Theorem: l.6.1,-eweoinfer that x,°« X,i9 Assertion: 2 can sbe proved 
Similarly using (7.8). 

By Theorem 1.6.7, when Slater's CQ or Karlin's CQ is satis- 


fied; there exists.a saddle point» [x,})u,,v,]" ofsthe function” L: 


eo 
From Theorem’ 1.6.19 if) follows that x, «eX, and (6.5) holds. By 
Theorem 1.3.5, (7.7) follows from the right inequality of (6.9). 
Hence (xX, ,5Uy5V,]° 1s%a°Kuhn-Tucker point.) (PromeTheoremi 1276 16 
there follows the existence of a saddle point [SyoUg VeoO,12 of 


the function L, which is a F. John point; furthermore, by Theo- 


remmi~O wpe og. eS Ol, / 


3. SUFFICIENT CONDITIONS FOR A MINIMUM IN 


GENERAL NONLINEAR PROGRAMMING PROBLEMS 


For the feasible point x, and the dual vector v, we define two 


convex cones: 
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Ky (Xs, Us) ih {x e ai wees (Xx) ae 0, 


HEAL (Xe) = 0,.xTAS (2) <0}, (7.12) 
Ky (Xe) ={x€ Em: 27g, (X4) =0, 
ENA ieee | ee een CTL) 


where the indices. 3j,s, i assumejon, all, possible. values. of) the 
sets OCxlaVy), OCX,) \OOxgw) wy ox) aoerespectively «(seea(7...3)). 
LEMMA 1.7.3. Let [X,,Uy,V,] be a Kuhn-Tucker point in the pro- 
blem (6.1). Then the cones Kj (X45 V4) and Ko (xy) coincide. 
Proof. We show first that Ko (xy) e Ki (X45 Ve): ici Ko (X,) ; 


then we need to show that 


-T, j ‘s , 
X hy (x,y) = 0 Wa getae OC Rea Vig) ee Gia) 
Using (7.1), we obtain 


athe (X«) +x7g, (X«) Uy +xTh,, (Xx) 0, = 
=x"fi(%e)+ > xThi(x,)o! =0. 


[€O(X4, Ve) 


Since each term in this equality is non-positive and all the 
ve Oe for je Oe we infer that (7.14) holds, hence 
xX € Ky (X45 Ve)- 
We now prove that Kj (X4, Vx) ie Ko (x,). Let xt Kj (X45 V%)- 


Pheneatetollows from. (7...) that 
= -T 
alee at $50) = x f(xy) = 0 


Hence xX « Ko (x,y), which completes proving the lemma. /// 
DEFINITION 1.7.6. The matrix of second derivatives Lg Se Ue Vig) 
is positive definite on the cone Ki (x4,Vx) if the quadratic form 


AL Gog Wann x > 0 for any nonzero vectors x belonging to the 


cone K,(xX,,V,)- 


(82) 1. AN INTRODUCTION TO OPTIMIZATION THEORY 


DEAN Ge ON alee. The matrix of the second derivatives 


Ly 6 Sa Ua Vig) is uniformly positive definite on the cone 


K,(x,,V4) with the constante C ae rou 


2 
1 GR AG ne Pes Ci lx 


x 
for any vectors x belonging to the cone Kj (x4,V,)- 

We shall state next the lemma of R. Finsler (see Appendix II) 
specifically for the problem in question. 
LEMMA 1.7.4 (R. Finsler). Let the matrix of the second deriva- 
tives Li (Be Ue Vix) of the Lagrange function be positive defi- 
nite on the cone Kj (4,V%) and let at the point [xZ,V,]°' the 
strict complementarity condition be satisfied. Then there exists 


eer such'that for any: tT > Tt, the matrix 


* 


[EO (Xs) 


Re: Ca ug, Us) FT E (Xe) Gi (%+) + a hi, (X«) [Ab (4) 


is positive definite. 


n+m 


THEOREM 1.7.2 (McCormick). Léete= [xy Ups vy | teh E be a Kuhn- 


o” 


Tucker point in the problem (6.1); let the functions defining the 


Ue cm 


problem be twice differentiable, and let the matrix of the second 
derivatives Lyx (Se Uy Vy) be positive definite on the cone 
Kj (xy, V4). Then x, is the, isolated local minimumpof) the problem 
(Sig Abe 

This. Theorem is one of the basic results in the theory of 
nonlinear programming. Subsequent to publication of this Theorem, 
various modifications and some generalizations have been obtained. 
Following S. Han and O. Mangasarian [1], we shall formulate and 
prove a theorem generalizing the result obtained by McCormick. 
THEOREM lees... liet Leet ody) € gotmel bey awk Ohne pe alact sam) 


the problem (6.1); let the functions defining the problem be twice 
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differentiable at a point x,, and let the matrix of the second 

derivatives bo Gry Uae dy) be positive definite on the cone 

KC X45Vy) or Ko (x,y). ‘Then. the point x, is the isolated local 
, 

minimum of the problem (6.1). 


Proof. Assume the converse, that is x, is not the local iso- 


lated minimum. Then there exists a sequence of feasible points 


{x,} such that ali x, FES we KX, = Xy and, furthermore, for 
each point x, the inequality f(x,) > f(x) is satisfied. 
Let eye (x, -- xy) / |x, - Xyll- It follows from the construc- 


tion of the points x, that the conditions 


f (xp) ~F (x oe 
0S —- | De gif, (%-) +O (|x, Xx), 
_ £42 ne 
O= lxe—x, | = YRex (x4) +O (| x,—Xe |), , 
hl aval : 
0 Soa) yf (x6) +O Xe— Hel) VF EO (%) 
are satisfied. Here O(z) means that ee O(z) = 0. Hence there 
a 


exists vanlimi tine pornt) y SOL the sequence ‘ves sucha 


Ho oy tea U euilX shar 0. 
yr nh (x) = OV] €a(x,); 


vyaekding sy. Ko (x,)- 
We shall use the condition for existence of the second deriva- 


tives of the functions defining the problem. Then 


G2 a) Ag UkE ex (Xe) Ye O (Xp —*e |), 


Xe |? Ves! 
a ae _ Yio (Xe) Yn + 0( (|x Xp—Xell), 


ie eerste leee gable) 4 1 y bby (Xe) Yet O (|X,—%e).- 


|| +z —*, |? x2, — Xe ll 


Multiplying the relations obtained respectively by qy,, U,, way 
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j © o(x,), summing them up, and making use of (7.2), we obtain 


TO 
O 2 2 Vj Ly Se Uns Var WY ati O(|| x, - xy ||) 


e 
Letting xX, go to x,, we obtain that Vipd SOW having in this 
case 0 ayo Lean un ee ay, which contradicts the positive 
detinawenesisy Ole lem ma tae c Lea (a tyire, ay on Ko (xy). Tae: 

If the conditions of Theorem 1.7.2 are satisfied, then for 
aq, = 1 the,conditions of Theorem 1.7.3 willibe=satisfied; which 
proves the validity of Theorem 1.7.2 (McCormick). Theorem 1.7.3 
provides stronger sufficient conditions than Theorem 1.7.2 does; 
there are problems for which the conditions of Theorem 1.7.3 are 
Assured, DU LE uSsenoOuvU possible To satisty the, conditions) of 
Theoremp ii ds2i. 

In what follows we shall need the following lemma. 

LEMMA 1.7.5. Let the conditions of Theorem 1.7.2 be satisfied at 
the Kuhn-Tucker point. [x,,u,,v,]. Then for any vector [u,v] <«E™ 
such that uw > lutl, iy? i ie]s coy Son enethe pointy x Peoieethe 


strict local minimum of the function 
Ly &, 0) =f (x) + 2 ui | gi (x)|+ re v/hi, (x). (7.15) 
a ‘= J= 


Proof. If the assertion of the Lemma is false, there exists a se- 
quence of points x, converging to xy,’ such that x, Foxy ‘and 
Lj(X%,,U,v) < Ly(x,,u,v) 


This implies that 


p (x4) =F (Xe) —F(%e) + x ul |g’ (x,)|+ x whi, (X,) <0. 
t= J/= 
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Let 


Xp—Xy 


Pep oy a I 


*p 


—>s* 
* 


The function w(x) can be representedyas the maximum function by 


let oin's 


letcx)| = max [et(x), -et(x)] , 


hi (x) max [0O; hI (x) ] 


Then, by Theorem 1.5.10, the derivative of the function w(x) in 


thesdinrection, yoyiu the point. x satisfies the condition 


* 


= os = yf (Xe) + 


é 
+ Yui lye (re)|+ SL vf <y, Ab (xe), <0. F 
i=] j€EG (X») 
Using (7.1) we express the gradient fis. and put it into the 


inequality obtained. Then we have 


2 [ué-|yT gh (x.)|—ui-y7g! (x.)]+ 
[wi cy, Ah (Xe)>4 =U) <y, he %)>] <9. 


JEO (Vg) 


Since. .u..> lu; | and v > voy, each term in the square brackets 
is nonnegative and therefore is zero. Hence 
Tet “= as xs We S 
Vee Ay) ee Ces valet x) = — 0", y hy(x,) < 0 
Pore Ie y teee bey ele Gea 7) Sue o(x,) \, oCx,,Vv,)-., Thus, 


y «€ K (xy V 4) and PEM ray eae > O, whence we obtain that 


foread beginning with some Xho the inequality 


si 


L(x, , Uy, Vx) > L(x, ,Uy,V,) is Satisiied. 
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Next we have the inequalities 


L, (Xp, Uu, 0) 22 L(x, Ux, De) Xe, Ug, Us) = 
Saf (Xe) = Ly (KES teed) oe LX, a, 0), 
e 


The contradiction obtained proves the Lemma. /// 


4, MODIFICATIONS OF McCORMICK'S THEOREM 


Later on, while studying numerical methods we shall need other 
formulations of Theorem 1.7.2. They are the following. 

If at [x,, v,] the strict complementarity condition is 
satisfied, then o(x,) = aCX,,Vy)5 ithe cones Kj (845Vy) and 
Ko (xx) specified by (7.12) and (7.13) coincide at the Kuhn-Tucker 
point with the cone 


Ky (44) = {2 € EP x7 g, (x,) =, 


(7.16) 
Sit te) =U GOLe.) 


Theorem 1.7.3 can be stated now as follows: 
THEOREM 1.7.4. Let [x,,u,,v,] be a Kuhn-Tucker point in the 
problem (6.1), and let the strict complementarity condition be 
satisfied at [x,, v,];- let the functions defining the problem be 


twice differentiable at x let the matrix L (CX Us Vi) be 
XX 


*? 
positive definite on the cone K(x). Then x, is the isolated 
local minimum of the problem (6.1). 

Introducing additional variables, we reduce the problem COre 15) 
to that of finding the minimum in the presence of equality-type 
constraints only, and for this new problem we state sufficient con- 


ditions for the minimum. Let us introduce the vector p a Eo, 


Also, we express the feasible set in the problem (6.1) in terms of 
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X={xe Beis) 0, hi (x) + (p/)#=0, fel: ae 


The initial problem (6.1) consists now in minimizing the function 
f over the two vectors x and p: , 

mscel — fanlial eg 

xeX peke 


Hence it is appropriate to extend the vector x, letting 


ie 
Zee. Dl. se E" oe and to combine the functions defining the con- 


Strains: 


R(z)= |B) BDAG OY os MOD +Z (OY | - 


Next we represent the problem (6.1) in the form: 
2 
minp, PPS (Zee es Riay=0}- (7.17) 


zeZ 


For this problem we introduce the dual vector y « ee m =" etic, 


and write the Lagrange function in the form 


L(x, pr = Mat doyle! (+ 
ve (7.18) 


eo [+5 ("| 


j=! 


Also, we write the matrix R, of dimension (n+c)xm in terms of 


Roz wena 


if in the problem (6.1) the point “x,"< xX, and’ the con— 


straints g(x) = 0,°h(x) = 0 satisfy the CQ, then the point 


Hee 
2g [Xy >» Py) 14,2 NWwhe re 
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pie wae eid xg ee 4a Tio} % (Ct, 19) 


is feasible, optimal “in the problem (7.1%), the constraints 
R(z) = O satisfy the CQ. ‘ 

The converse is also true: 

if the point z= [x, p,] “1s feasible, optimal in the prob- 
iemmiCiieto wares Const radnt Sua hic) sOmesetlsiy. vnes COlm Lneneiine 
point x, is feasible, optimal in the problem (6.1)5 and the 
constraints g(x) = 0, h(x) <0 satisfy the constraint qualifi- 
cation at the point x,. 

Let [x,,u,,V,] be the Kuhn-Tucker point in the problem GGaL yr 


Then) letting .y, = (u,,v,] and using (6.5), (7.18), (7.19) and 


the condition x, © X,, we obtain 


at! 1 
Riza ate E257 ep Olen aC Tae) 


In a more elaborate form, these conditions become: 


Lila, Yad e(%) + Ry (Ze) Ye = 9, 


Li (2a, Yu) = Yetip! = ytt! VA (x) =, 


Enz Gf Ves pe) =). 
: A(t%,)-+ aD (Px) Px 


Thus, to each Kuhn-Tucker point [x,,Uu,,V,] for the problem 


Liga) 


(6.1) there corresponds a stationary point [2,5 751) ofmiherLa-— 


grange function Locos y) Ome t Ie sprobl cme Ginwl()) se COn Vers eal yemedct 


[ZV x] is the stationary point of the Lagrange function i and 


j ej : 
Voy, ) te One forme ayaa o Cx.) then [X,,U,,V,] is the Kuhn- 


Tucker point in the problem (6.1). Let 
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Legg (Xu “ther Uy) | One 


Liz (Zs) Ya) = See ea era : (7222) 


THEOREM 1.7.5. Let the functions defining the problem (6.1) be 
, 


twice differentiable at the point x, ¢« X. There exists a vector 


Vy € E"™ such that conditions (7.20) are satisfied at the point 
+m+ 

[Zy2¥e1] € pe 2 where. 2, =, [(x,,p,J]« pire and the coordinates 

Da wre udetined, from (7.19). The matrix Loar Ve) is positive 


definite on the subspace 
= mere AR fx 
K, (xy) =e eeu, =z, RoC) = 07, , CT. 23) 


Then the point x is the isolated local minimum of the problem 


* 


GGzD)e 
This Theorem can be proved via the same considerations as in 


proving Theorem 1.7.3; but it is much simpler to show that the 


conditions of this Theorem imply the conditions of Theorem 1.7.4. 


eee eer |. K,(Xy)- Then 


x ge (Xe) =Ore, MAL (Xe) +> O/pL=0 je[lzc]. 


J 


ij <= o(x,), then, by (7.19); py. 0 and! tor the vector 
Ze K,(x,) we may take a vector in which only one coordinate 
zor = ve is nonzero. From the positive definiteness of Ceres) 


os . 2 . 
on K,(x,) it then follows that (2htd) ve > O and therefore 


ws, Sa Ole CO meet ele arm ye o(x,)- Hence at the point [x,,V,] “the 


strict complementarity condition is satisfied. From the LAaCt. tha 
pd = 0 for all ~j <,/o(x%,). we infer that each/vector [x,vl 
belonging to K,(X,) is such that the vector xX « Ko (xy). a 


the conditions of Theorem 1.7.4 are thus satisfied. 
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This technique for reducing the general problem (6.1) to the 
problem (7.17), with equality-type constraints only, will be used 
in the future in describing the numerical methods. We say that 


e 
the auxiliary vector p introduced above is an artificial vector. 


5. NECESSARY CONDITIONS FOR A MINIMUM 


For programming problems necessary and sufficient conditions for a 
minimum are given in Theorem 1.7.1. Now we formulate necessary 
conditions for a minimum for a general problem of nonlinear pro- 
gramming, (6.29). 
THEOREM 1.7.6. Let U be a convex set the interior of which is 
not empty; let the functions defining the problem (6.29) be given 
On some open set containing U, and also differentiable on the 
point x, belonging to the solutions set X, of the problem 
(6.29); and let the vector function g have continuous first par- 
tial derivatives at the point Xx, « | Then: 

@e1. it is necessary that qd, € Ey, ua, = ES, Vy € EY exist, 
not equal to zero at the same time and such that at the point 


(Xa v,) the complementarity condition (6.5) be satisfied, and 


for any x « U let the following inequality hold: 
0 
ote co L Ax iene, ag)? ; (7.24) 


e2. if the function Leta Vonds) is pseudoconvex in x 
with respect to the sét U, it is necessary that 
3 0 
Xo es Are mineL CRS Wy Va a 
xeU 
3. if the set U deeopenm,eand the Arrow-Hurwicz-Uzawa CQ 


is satisfied at the point x,, then in the problem (6.29) the 
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Kuhn-Tucker point [x,,u,,v,] exists. 
The proof of the theorem can be found, for example, in O. 
Mangasarian [1]; assertion 1-has been proved also in F.P. Vasil'ev 


ve 


Luin mOr ina N. NewMowseeva ecu aloe (bl. 
6. PARAMETRIC PROGRAMMING 


The functions defining the problem (6.1) may contain a parameter. 
Parametric programming is in fact the investigation of the depen- 
dence between solutions and the parameter. We discuss this subject 


briefly. We consider the problem of finding 


TL Tae ee Ve) eee C7225») 
xeX yeY(xX) 
ign ® m , 
Kia (xiesPeeteeGx) Ole Vis) =alyse Ba: BCx hy) <0) § where 
H: Boo es Bessy) Eo ye” + EF. The interior problem in this 
ease is a nonlinear programming problem and consists in finding 
v(x )el= #Are?amax FURY) s . eZ) 


yeY(x) 
We compose the Lagrange function 


L? (y, i, x)= F (x, ye > MBE (x, y). 


t=] 


Assume that for any x the set Y(x) is not empty and there ex- 
ist vector functions yx), (x) such that they make a Kuhn- 
Tucker point: 
Li(y (x), (x), x) =F, (% y (x))— 
S 
— >) A(x) BE (x, y(~) =0, (7.27) 


t=k 


x (x) Bi(x, y(x))=0, 20, B(x, y(x)) <0. (7.28) 
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The exterior problem consists in finding 


xy = Are min -¢(x), OUR) ee (xs ¥ (Xo 
xeX 
e 
Assuming that the functions F(x,y), y(x) are continuously 


differentiable, we obtain 
d oy 
HF alX, y(X))+S4F, (x 9 (x). 


The second term is not, generally speaking, equal to zero, 
and this makes the solution of the exterior problem more complica- 
ted since it requires the matrix 2 be determined. Fortunately, 
the situation becomes simpler because the following formula holds 


true : 


ap 2 Ta er t 
dc F(x, y(*))— 2) M (x) BE (x, y (x). (7.29) 
t=1 
Indeed, assuming that the functions F, B, y(x) are continuously 
differentiable in x, we differentiate the first relation in 


Ge28) We obtain 


sll : : 
OA Blix, vm) + lex, yon) + 
$e BEC RM) w= LO 


Multiplying this equality by A*(x) and noting the first relation 


Tne Gia2o pe ewer Obtain 


Tee )ORG))” = =Bi Gy ic cone 


Taking into account (7.27), we obtain 
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ore, =D YY Bix, y(x)) = 


aes, BUA AN) (x) 


—_ 


Vlei om Gio). 


For an exterior problem the Lagrangian has the form 
S(x) = ( ) 
Lita) eS RE Gx pay Sea BGx) 
The necessary condition for the minimum is that the conditions 


s k 
FON Xe tle) = > Be (teuus) Af » eg (x,) =9, 
pe! i=l 
sued] 1X4) = OSes Se OF BH(xe ie 0 


bemsa Iotved where yo =aVCx 7) i, = Cx) 
Reacts that the functions) 9)\(x); yGs) ane known, we arrive 
at the usual problem of nonlinear programming, in which the mini- 
Munworethemrune bones Hex. nCx))) sl sesoughic. 
The next theorem provides sufficient conditions for the func- 
trons vil x) A Cx) Sto ber di trerentiable.? hier 
G(x, y(x)) = {je [1:s] : BU(x,y(x)) = 0}. 
THEOREM 1.7.7. Let the functions F, B, H be twice continuously 
differentiable in all the arguments, let at the Kuhn-Tucker point 
[y(x),A(x)] in the problem (7.26) the sufficient conditions for 
the maximum following from (McCormick's) Theorem 1.7.2 be satis- 
fied; let at the point “yC(x),X%(x)qe thet strict complementarity 
condition be satisfied, let the gradients B(x, y(x)) be linearly 
independent for j <« o(x, y(x)). Then the functions WC) IMG) 


are differentiable, and their derivatives are defined by the 


following system: 
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f te cli 
| Li (x. y (9) eed a) 
D (0 (x)) BF (x, y(x)) | D(B(x, y(x)))| | 2 
dx J 
‘ | Loe (x, y (x) 
~~ | D(A) By (x, y (x) J? 


the derivative of the function (x) is Save Tam Vien Ghia oye 

This theorem is a slight modification of Theorem 6) @alipecl ava 
Chapter 2 of A. Fiacco and G. McCormick Lola wach Contains a 
proof of Theorem 6 based on the application of the Implicit, fun'c— 


tion theorem. 


8. NECESSARY CONDITIONS FOR A MINIMUM 


IN OPTIMAL CONTROL PROBLEMS 


7, THE MAIN PROBLEM OF OPTIMAL CONTROL 


Assume that the behavior of a controllable process is described 


by the system of ordinary differential equations 


ot) = f(x(t), u(t), t), De Ss tae err tO) eed 





O ? 
C82) 


where the vector x(t) e Ee. usually known as a state Vectors 
thes vector’ u(t) ier E*. is said to! beothe vector: of controls. All 
the components of the vector function f(x,u,t) are differentia- 
ble in the aggregate of the variables x and ue. We consider 
first the case where the interval T is given and the initial 
state vector Xo is fixed. In many applied problems, time plays 
the role of the independent variable t and the system (8.1) 
describes a dynamic process; hence it is often referred to as a 


fixed-time problem. We say that the case where no constraints 
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are imposed on the vector x(T) ‘is a problem with a free right 
endpoint. 

We shall say that a pieeewise-continuous LIne t von Uy = wT) 
attaining on an interval O< t < T ureterary values in some 
specified set U CE” is a feasible control. 

A given feasible control u(t) uniquely determines the con- 


tinuous piecewise-differentiable solution x = x(t) of the sys- 


tem Ceé.1) on the interval O-2 4% <7. ~The functional 
Ree Dix Glpp) Cone» 


is the control performance criterion. The function b(x) is 
assumed to be everywhere differentiable. 

The problem consists in finding the feasible control law 
u(t) and the corresponding solution x(t) ‘to the system Corl), 
such that the function R attains the least possible value. 

This problem of optimal control differs from the problem 
considered in classical Calculus of Variations only in the new 
requirement u(t) « U. The necessary condition for a minimum 
for such problems was first stated and proved by L.S. Pontryagin 
and his colleagues, and is known as the Maximum Principle. Later 
on, we shall formulate this principle in a form different from 
the original. 

Let us consider the vector function p(t) satisfying the 


following adjoint system of ordinary differential equations: 


ee SPE), at ps cae) (8.3) 
“ah” ae axi priya hte plsal. 
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Using the adopted abridged notation, we can rewrite this system in 
a compact form: 


DC ticemw pip Cx Cdyn UC on tg @( 1), (8.3) 
We require that at the end of the motion the condition 
p( Ties) b (xD) (8.4) 


be satisfied. We say that the vector function p(t) thus defined 
is the adjoint multiplier, or an impulse. We introduce the auxi- 


liary function 


Gx, ult,.p) = (f(x,u,t), p) : 


usually called a Hamiltonian. 

Any feasible control u(t) defined on the interval 0 < t < les 
which minimizes the functional R is said to be an optimal control 
and is denoted by u,(t)$ we, say that the solution to the system 
(8.1) corresponding to the control u,(t) is an optimal trajectory 
and we denote it by Ky (Cb). Jf sin .C3.3)) and (8.4). we take Wan) 
and xX ,(t) respectively for wt) and x(t), we obtain the im-— 
pulse pct). The necessary condition for a minimum in the optimal 
control problem is as follows. 

THEOREM 1.8.1. Let u,(t) be an optimal control and let Ky CL ) 
be an optimal trajectory of the system (8.1). Then there exists 
an impulse PAGto.  suchathat for 0 <4..< Ta the following condi- 
tions are satisfied: 

@i. the minimum of the function HOCK CUO al occ, pb, (£2) with 
respect to u on the set U_ is attained at the Post w.— Ulu), 


i.e., Uj (ome Aug amin WGa)(t), .u,topeceh> G 25) 


ueU 
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e2. if the right sides of the system (8.1) do not depend ex- 
DLATeitlyOn te Gthemsystem 1s autonomous), she ftunetven 
H(x,(t), Uylt), OLCt)) ‘is constant. 

This necessary condition for the minimum is said to be the 
Minimum Principle for optimal control problems. L. S. Pontryagin 
formulated the Maximum Principle, using in fact the impulse satis- 
fying the system (8.3), but, in contrast to (8.4), the boundary 
condi tlongwas) Ol ney form 


PCL) © ab exer ys, 
hence the condition (8.5) was written differently: 


u,(t)- <= Arg max HC Xy.Gty) 2 teats, p.~ Cty) 
ueU 

7 
Therefore, the difference is not essential; however, the formula- 
tion we have used is more convenient, especially in considering 
game problems and in establishing the minimum principle satisfying 
analogous necessary conditions for the minimum for nonlinear pro- 
gramming problems. 

Next we make use of the results of Section 4. By Theorem 
ieee ri lora) fixed, ty tbe ane HGKSCO), Uy te Patt) is 
differentiable in u at the point u= u,(t) and the set U is 
convex, it then follows from (8.5) that the condition 

u,(t) ¢€ Arg pre crue U,(t)aats Dy (t)),, G= 0,Ct)) 

<a (Sn 6)) 
SMS cub Sue Me.Clig 


This necessary condition is usually called a linearized 


minimum principle. 
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At present, many different proofs of Theorem 1.8.1 are avail- 
able. They are published in many books; therefore we shall omit 
the proof herein. We only refer to the monograph of L. S. Pon- 

CGY ae nee G. BOlt Vaniskerjs R: V. Gamkrelidze, and E. M. Mishchenko 
[1], and also to N. N. Moiseev [1] and to R. Gabasov and F. M. 
ialieal Iowa, {2 j) 4 

We may make an attempt to construct a numerical method for 
solving optimal control problems using the minimum principle. For 
each fixed set KC tye es p,(t) we find from the condition (8.5) 
the point-set mapping 


uy =" BURape eb yD Caer 


on which the Hamiltonian H(x,, 4,0) py) reaches the minimum in 
UTesU SS Uponesubstatucion of 1CS...7) amtorids. «(Sei sand mGsiac na we 
obtain the system consisting of 2n ordinary differential equa- 


tions for n-dimensional vector functions Seo yh 


es 
* 
II 


EOC PC Rane Dadiy hs 
(8.8) 


o 
* 
| 


aie yy BCxy7 Epa) t)P, 


For this system n conditions are given at the initial time 
(at the left endpoint) and n conditions at the time (at the right 
endpoint): 


X,(0) = Xo , DCT) Seb Cx, Ch) ae (8.9) 


Thus, the solution of the initial problem is formally reduced to 
that of the boundary value problem for the system of 2n differen- 
tial equations. . If one could solve this problem, that is), tdefine 


the functions x, (t), py (t), satisfying the system (8.8) and the 
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conditions (8.9), one would find,the optimal control u,(t) for 
the initial problem among the functions representable as (8.7). 
This’ technique 1S quite ‘convenient Oe AiaEIVywiCe le ysxollwncaleia pe able 
initial problem. However, it is of ‘quéstionable utility in prac- 
tical conpuTat tone The reason is that the boundary value problem 
obtained is usually essentially nonlinear. The mappings 8 are 
often discontinuous, which leads to the situation that the right 
sides of (8.8) become nonsmooth or even non-single-valued. The 
boundary value problem does not always have a unique solution, and 
the procedures of solving this problem are frequently divergent. 
The best that can be done for such a problem is to determine a 
good initial approximation; only then a real possiblity of solving 
the boundary value problem develops. These circumstances compel # 
us in the case of optimal control problems to seek different com- 


putational methods: we shall describe them ne Chia pice irr Or. 
2. OPTIMIZATION WITH RESPECT TO CONTROL PARAMETERS 


Among optimal control problems one frequently comes across cases 
where optimization consists not only in choosing the control func- 


tion u(t) but also in finding the optimal value ofea, Vector rot 


additional control parameters € é« ES. ‘The system (8.1) has in 
this case the form 
be | 1 
cen WGeatls wee) |< (8.10) 


The functional to be minimized also depends on €&: 1 eS Toe a Sa 


It is required to find a feasible Gontrol funetion* uct), a vectvor 

ee CONE es So oe eS 

# Editor's note: Not quite! See J. Stoer and R. Bulirsch, Introduction to 
Numerical Analysis. Berlin Heidelberg New York Tokyo: 
Springer-Verlag, 1980. 
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of control parameters € and the corresponding solution x(t) to 
(8.10), such that the functional R assumes the smallest possible 
value. 

The necessary condition for a minimum is given by Theorem 
1.8 a). inwhich instead of f(x,u,t) and = bG:) respectively 
f(x,u,t,€) and b(x,&) are taken and one additional condition is 
introduced fora minimum ‘of -.R sane: 

brGx(T), &) + i £-(x(t), u(t), t, €) p(t) dt = 0 

(ererallaly 

This condition must be satisfied along the optimal trajectory, 
He @ ty FOF eG) So Gt x(t) Exe Chop ty = p(t). f= Sap 
where €, is the optimal vector of control parameters. The con— 
cliicsow (S510) aces tee C358) aaa (8.9) scalar relations necessary 
for defining the components of the vector ere 

Theorem 1.8.1 and the assertions just listed are fundamental 
results in optimal control theory. We shall show next how, using 


there results, various particular problems can be studied. 


3. THE PROBLEM WITH FIXED TIME AND FREE RIGHT ENDPOINT 


In the main problem formulated above, we shall require in addition 
that the state vector x(t) ata given finite time T belong to 


the terminal manifold 


X= ie ee g(x)=0, h(x) <0} 


2 


where the differentiable vector functions g sand) hare the 
mappings 


ge: Eo aes Hee RRS nS 


As well as in the case of nonlinear programming problems, we 
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take for the functional a function analogous to (6.25): 


R(x, u,v, gh=gb(x) + Di algi(x)+ 3B whi (x), (8.12) 
e | j=) 
oh al “ e c R 
ere qeé EL, Wl Sa AS EL% 


The assertion of Theorem 1.8.1 remains valid; one should 


only make some changes: instead of (8.4) the following condition 


is to be used: 


~ 


DUTT aes BR 20 WV de as (8.13) 


Moreover, the complementarity conditions must be satisfied: 


yipd (X(T) ) = Oo , j = (ive) 


If the conditions at the right endpoint satisfy the constraint 
qualification, we may put q = 1; the new unknown dual variables 
ui, v are "compensated" by the condition Cl) ee be Swen eeu LaS 


minimum principle, we can reduce this problem to a boundary value 


problem. 


4, LAGRANGE'S PROBLEM. MAYER'S PROBLEM. BOLZA'S PROBLEM 


For Lagrange's problem the functional to be minimized will be 


written as the integral 


T 
R=\ B(x(t), u(t), #)dt, 
0 
where B denotes the differentaible functLon son | xXmandes uu. 


+ 
Next, we introduce an additional state variable x" through 


the equation 


dxnt+1 


7 =B(x, u, t), x"*1(0)=0. 
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This problem can be now formulated as the problem of finding the 
control u(t), ensuring on a fixed interval [0,T] the minimum 


of the terminal functional , 





R = x@tlony 
in the presence of the relationships 
ae ee att), Me ee Bee ty. 
Thus, in the expanded state space ee the problem has reduced 


to the main problem considered above. For the Hamiltonian we 


take the function 
H(x, x®*), a, t, p, p?4)=<f (x, u, f), p> +B (x, a, t) pt. 
Eq. (8.3) the the condition (8.4) are replaced by the following: 


d 
a =—f.P—B,, p(T)=0, pr**(t)=1. 


It it is required that at the end of the motion x « owe te 
then follows from (8.13) that the condition 


p(T)= & uigi(x(T)) + DS vthh(x(T)) 


f€Eo(x(T)) 


has to be satisfied, where o(x(T)) denotes the set of active 
constraints of the Magentis type. | che last secondi tion ads usually 
referred to in Variational Calculus as the transversality condi- 
tion; it has a simple geometric meaning: ~ the vector p(T) « Eo 
must be orthogonal to the tangent subspace toward the terminal 
manifold X. 

Mayer's problem with fixed time consists in minimizing a func- 
tional of the form 


Ree Dx ks 2 i) (8.14) 
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Introducing the additional state variable ie) = t we reduce the 


system (8.1) to the autonomous system 


dxn+l 


Ad ers eae 1s 


=b(x(T), x"*1(T)). 


(8.15) 


x 


The expanded adjoint multiplier consists of the vector p(t) sa- 


tisfying the system (8.3), and the scalar so for which 
d ane 
gars iaaehe) 1nd (8.16) 
| 
with terminal conditions 
mae ‘ 
De ee a) any p Ce Dr Rect) ons ss C8917) 


The Hamiltonian of the expanded system is constant along the opti 


mal trajectory : 


n+1 


(f(x,u,t), p(t)) aan) Ce) == scons tr 


The problem has thus been reduced to the basic problem. 


In Bolza's problem the functional is given by 


ii 
R=b(x(T), T)+\ B(x, w, bade. 
0 
We also introduce two additional state variables 


dx"t+1 dxnt2 


dt ie dt 








=a 13 (2, iy Kat). 
then the functional to be minimized has the form 


Pree CD ECsCL i xe CD) 


In the extended state space meee the problem is again reduced to 


the basic problem, Theorem 1.8.1 is then applicable. 
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5. THE PROBLEM WITH FREE END TIME 7 


It is required to find the vector of the controls u(t), “the ain— 
terval [0,T] and the corresponding solution x(t) to the system 
(8.1), so that the terminal functional (8.14) attains the smallest 
possible value. We suppose that such an interval exists and is 
nonzero. 

Putting t = t&, we make a substitution of the independent 


variable in system (8.15). Then the system (8.15) becomes 


Cae ae me dxt th 
ee et eke, ES (8.18) 





We consider that the new independent variable tT changes on the 
fixed interval [0,T)]. For the system (8.18) we shall be seeking 
the control vector u(t) and the control parameter 66. somite 
fig Osea eet A Smein in chbal Oc le b(x(T9), Cr attains the 
smallest possible value. 

We have arrived at the problem with fixed time, the free 
right endpoint, and one control parameter. It follows from @Sr) 


that along the optimal trajectory 


0 


[<f (x(x), w(x), x"** (x), p(t)> + p"*! (1)] dt =0 (8.19) 


‘Soe 


is satisfied. We take into account that the integrand is constant. 
We proceed to the initial variables and, using the condition at 
the right endpoint (8.17), we obtain 
DAK A(x Ty aT) hp t = 0, 
p(T)=6,(x(1), T). Sas 
Upon solving the optimization problem (that is of Lindine a1Gw) 


and €), the optimal motion time is defined by the formula 
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T = TOE: where T) denotes the arbitrary positive number speci- 
fied before the calculations (one may always assume, for example, 
that To = 1). The assertion of Theorem 1.8.1 concerning the 

property (8.5) remains valid, the first relationship in (8.20) is 
formally the additional condition for defining the optimal value 


Oil Wek siersao@re qwabers Ah, 


6. THE MINIMUM TIME PROBLEM 


It is required to find the vector u(t), the corresponding solu- 
tion x(t) to the system (8.1), so that the vector x(t) reaches 
ee Ven CUE Ken ee DCMSh Ole Sipe Vial @:0 mica mer 

The interval of variation of the independent variable in this 
case is unknown. Hence, as in the three previous cases, we intre- 
duce re new state variable soe go over to the system (8.15), 
substitute the independent variable putting t = t&é, obtain the 
system (8.18) for which it is required to find the control u(T) 
and the parameter &, on the fixed interval [0,T5] SOmbnat 
x(Tp) e X and the functional aa GleS attain the smallest pos- 


sible value. Now we introduce the auxiliary function R analo- 


gous) to. (8.12): 
E 2 c 
R=qxrtt+ DY gi(x)ul + DA’ (x)0/. 
ial zal 
Here q > 0, v > 0. The necessary minimum condition (8.11) leads 
tS Oo CS, al), sakeulelaisayer akiel “wibisgal 
HCXCT) , WOT), e p(T)) = cae S 0 ? 


in addition, the complementarity condition hJ(x)vd = 0 appears, 
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as wellvas: thes conditions «g(x(T)) ='0,-'h¢x@T)), < 0% 

Many other problems of optimal control can be reduced to the 
main problem in a similar way. Theorens iy saiee se nappllieapile, 
however, in a quite widely known and important case when mixed 
constraints are imposed on the state coordinates and controls a- 


long a trajectory of the form 


ECXOUIG, UCC) Get.) ae =s a0) ACCEL GC Ct) get NO 


Properties of such problems are investigated in Dubovitskij 
and Milyutin [1], Smol'yakov [1], and in Anorov [1], wherein the 
notation for the necessary conditions for a minimum is more compli- 
cated and not very suitable for devising numerical methods. We 
Shall not dwell on these conditions, since we do not need them in 
the future in order to construct numerical methods, which will be 


based on other ideas Originating from Nonlinear Programming. 


Chapter 2 


CONVERGENCE THEOREMS AND 
THEIR APPLICATION TO 
THE INVESTIGATION OF NUMERICAL METHODS 


In this chapter we cite the main mathematical results which will 
be used in the subsequent chapters in justifying numerical methods. 
We discuss first the methods of stability theory: stability of the 
first order approximation and the method of Lyapunov functions; 
then we give theorems on convergence of contraction mappings and 
point-set mappings, as well as methods for solving systems of nom 


linear equations and methods for finding the minimax. 


1. STABILITY OF THE FIRST ORDER APPROXIMATION 


BAS LC OO EEN TLONS: 
We consider the system of ordinary differential equations 


Gb 


ate eee (1.1) 


where the vector function f(x,t) « Celene oT); Ie) 4 LO ee ach 


t 20, #f is continuously differentiable over x everywhere on 
a and for any x is continuous in +t on the set I. Stay leas 
case, the initia!) condition, |< C0), = Xo defines a unique solution 


of the system (1.1), which we will denote by x= x(Xp,t). 
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DEFINITION 2.1.1. We say that the system (1.1) is Lagrange stable 
eae Orem Xo 

@1i. the solution (Xo) exists tor sale) te I, 
e2. the norm x(x, t)]| is bounded on 1I,. 
DEFINITION 2.1.2. The solution x(xy; 0)? "on the system+(1/1) is 
Sald to be Lyapunov stable as t+ (or, in short, stable), if 
Lor any 6 > 0 ~there’ exists 6 = "6Ce) such that: 

ei. every solution x(Xp,t) of the system—-(i7 1), satisfying 


the condition 


Xo -%%ll < 6 Cie2) 


is defined for. 0 < t. <, =; 


e2. for these solutions we have the inequality 
I|x(x 5, t) = RCX et) fle we Wo tees Te 


In other words, the solution MAL e) Tete tstableg Ifa 
other solutions originating in a sufficiently small neighborhood 
Of the point x,, remain for any t ¢« re inside the neighborhood 
constructed around the solution XCX45 t)) The stability implies a 
CON nvOUS seUndshOrme die tore i dependence of the solutions 
x(Xp,t) of the Seca ate on the initial point Xo: 
DEBENELEONDZ sls ene solution XCX, 70) CO" <) t < 2)" as sand to 
be asymptotically stable (as t+), if: 

el. this solution is Lyapunov stable; 
@2. each solution x(Xp,t) satisfying (1.2), possesses the 
property 
Lim | cee tole et || Bi Olas Gi.3) 
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DEFINITION 2.1.4. The solution xCx, Ft) (O % to<i@) its°said to 
be globally asymptotically stable if this solution is stable and 
LUercondahon (Gleao)) holds for the solutions X(Xp,t) of the 
SV SibenieGls.1)) met Ormeau Xo ¢ Eee . 

It is often convenient to reduce the system (1.1) as to make 
thegpoint ex = Ohsan’equilibriumxpoint) ivé.}> £¢0}t)s (Oo! for 
Una Ce I,. In this case, the system (1.1) has a trivial solution 
x(0,t) = 0 which we denote by x(t) = 0. We can reformulate 
Definitions’1.2.3°°and™ 12274 aas follows. 
DEFINITIONS 241.55 The trivial solution’ x(t) =/0! of the*system 
(1.1) is said to be asymptotically stable if for any ¢ > 0 there 


exists? 6°=)6(e) Ssuch*that*for any solution to (1.1)) satisfyiie 


the condition IlXp Il < 6, we have ||x(x,,t)||< e¢ for all t « List 


lim x(X_,t) == nOien e: (1.4) 
00 
DERUNE EONS Ze O ee lLhes Lrivia Pesolwtlon a x@t ie =pOlormune syst en 
(1.1) is called globally asymptotically stable if: 
@1. it is Lyapunov stable; 
e2. for any x, < E the condition (1.4) is satisfied. 
DENN O Nia pelea LN Miri 1a esol bl Onmeac< Gy i= Ommon sthemsy stem 
(1.1) is said to be exponentially stable if there exists a neigh- 
borkoodgotw therormieain  G such that tor any XQ € G we have the 
inequality 


e(aq, utc. Nee (1.5) 


where N, k are some positive integers not depending on the 


choice of the point Xo: 
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The exponential stability of the trivial solution x(t) = 0 
implies the asymptotic stability of this solution. 


Indeed, for an arbitrary e« > 0 we put 
e 


Nanas | oe 


|| ne 
then it follows from (1.5) that I]x(x_,t) || < e and, furthermore, 
(1.4) holds. 

Methods for assuring asymptotic stability are quite useful 
for justifying the convergence of numerical methods of optimiza- 
tion. Some numerical methods reduce to finding limit points for a 
system of ordinary differential equations of the form (1.1), with 
solutions x(Xo,t) tendingeacs Use Cemtonsoluttons#ote thes inaitiad 


optimization problem. We say in this case that the system (1.1) 


is a numerical method for solving the optimization problem. 


2. AUXILIARY LEMMA 


LEMMA 2.1.1 (Gronwali). Let the functions u(t) and v(t) be 
continuous) for test ajr) x. let "Cs> 0 Vand 1et “for at > al sthe 


inequality 


G 
PUGS Wie eC tie) Vics) ds 
a 
be Satisfied. Then for t >a we have the inequality 
t 
Pact) ges Cexpiy 2 ivCaiods 
a 


Proof. Let us multiply both sides of the initial inequality by 
VC ta) uhs 
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rol ul<toOl [C+ Sacstlocas) 


a 


c 
Let w(t) = f |u(s)| |v(s)|. ds. Then the inequality obtained can 
z i 


, 


be represented as follows: 
Ateed a CEC ewCt) 1 
or; Noting that “O"'< C+ w(t), “we “have 
WwW 
Cert) wotage tila 
Integrating both sides, we find 


iv 
Ine (Crm (ty "= ine Cece eee (sedis = 9 
a 


which implies 


i 
CEs wt) = SCrexp i irCanteds 
a 


Taking into account the initial inequality, we obtain 
ty 
a(t) [P< Ce w(t )® << Crexp i) v¢syieds 
a 


We have thus arrived at the required inequality. /// 


3. THE MAIN THEOREM 
We consider a system of the special form 


dx 


Fie ae OCs )e (1.6) 


where A is a square matrix of the order n. 
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APPLICATIONS 


DEFINITION 2.1.8. The function v(x,t) has an infinitesimal up- 
perilimiteas* mm 0, - ifafer-anyetfe- 2.0 “therécexists <6 = é(e) 
suchS thatiel(v¢x}t) |s<nesi ror? |lsile omeandeall (tes are 


e 
THEOREM 2.1.1 (Lyapunov's Theorem on Stability of the First Order 


Approximation). Let the matrix A 


stant and let all the eigenvalues of 


Suppose the function | ocx, t>|7| x | 


neste Se a O! Then the trivial 


system (1.6) is exponentially stabl 
BRO Ot. 
G 


where is some neighborhood of th 


system (1.6) satisfies the integral 





For the system.(1.6) we £ix an. initial point 


in the system (1.6) be con- 
A have negative real parts. 


has an infinitesimal upper 


SKoMbhesiorm sie) S CO) one Gelaye 


ee 


ok mT 


0 


S Cisse, AN SOUnihesteim wae Was 


equation: 


platy Saeiean eck [ e(t-s)A o(x(%,8), s) ds (1.7) 
Jeleners) Tels) (SCE NRS inkviereib< GGs)) = en is definable by the formula 
a(t) = I + y = tial 
T= iis 
and satisfies the system 
aot) = Aa(t) 


Di Gierentuauine (i) over ant 


with these properties taken 


into account, we obtain that each solution to ClS7) *satisiiesr (1.6). 


From the fact that the eigenvalues o 
real parts it follows that for 


such that 


[| oct) || <3 ke 


ty > 0 


f the matrix A have negative 


there exist, ok > 0. 4 = 0 


=—nt 
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Using the last inequality, we obtain from (1.7): 


iu 
Ix(xq,t)ll < klbeglle"® + fe MS) II g¢x(x,,8), 8)l] as 
0 


y 


For any e¢ > 0 there exists 6 such that ||$(x,t)|| < lps 
rer -cuch "|x| 6 and for any “t > 0. Let |x || < 6. Then we 
CApeton0. SleenO” sich™thatrtor alls (O t2iticeT® the inequality 


I|x(xg,t) || < 6 is satisfied and 
nt “ons 
eT lhe(egs tll < Klbegll + fe" lx.) Il ds 


We use Lemma 2.1.1 and obtain the estimate 


2? 
iRGra hacer nie ee ae, ree) 
whi chFholdstfory,O0-<.t <. Ts. We, choose... €.. so ,small.that..o<s <n. 
Then it follows from the inequality obtained: ||x(xp,t)|| < k |kol| 
until I]x(x9, tI ie Ls II Xoll < 2, then (1.8) holds for all 
taz70, (whiehscompletes: the prooer of the; Theorem, 4/// 

The analysis of stability of a large class of systems is re- 
duced to the investigation of EuaL ous of ;the form (1.6). Indeed, 
letethe point 9a ¢« E"s (bean equilibrium for the system (1.1), 
Cone Gant =) OpeEPutting we xXt=sastoye »wepinterchanscesthe varla- 
bles. Then we arrive at the system 

a = f(aty, t) 


Assume that the vector function f(x,t) is differentiable over 


x at the point x = a; then the system can be represented in 
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the form 
fy = t¢a,t) + 2%(a,t)y + llyllycy.t) 
= Ay + o¢(ynt), > 
where 
b= te the oo oy) = aalbi(rst). 
Lim yey, t sil =e Oe (1.9) 
yO 


The system (1.1) is thus reduced to (1.6); and if the matrix A 
LSeconstant, ands thescondi tron CL.9) holds tinaitormly Tat, ‘we 

can use Theorem 2.1.1. For the exponential stability of the equi- 
impr iumex Gis) =o OLmuiiCm Sy Shen Gi.) cha wall np em ott he tenit nhac 
all eigenvalues of the matrix A~- have negative real parts. The 
problem is reduced to the investigation of a linearized system 
(the system of the first-order approximation, or, as frequently 
referred to, variational equations): y = Ay. 

The exponential stability of this system implies the exponen- 
tial stability of the equilibrium pent for the system) (ie 1) )) ithe 
technique described is often used to justify the convergence of 
numerical methods. As will be shown in Subsection 3.5, these re- 
sults will also imply the local convergence of the discrete approx- 
imation of the system (1.1) for a sufficiently small stepror inte— 


gration. 
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2. THE METHOD OF LYAPUNOV FUNCTIONS 


1. LAGRANGE STABILITY 


In this subsection we shall denote by 7 {t,} an infinitely increa- 


Sing sequence of times t tending to infinity, and by ty the 
on element. 


ALE ORE Map Zee tee CMS yest Cm ell) eel immeck, (xa. ha) mae Co x I). For 


a Lagrange stability of the system (1.1) it is sufficient that on 


Bee<- li, there exists a function v(x,t) such that: 
@1. w(x) < v(x,t), where .w(x) is a continuous, bounded 
function; 


@e2. for each solution x(X9,t) of the system (1.1) the func- 
tion V(x(X_5,t), t) is nonincreasing with respect to the variable 
ie 


Proot. For any t > 0 we have 
W(x(X9,t)) < v(x(x>,t),t) < V(xX9,to) - (2.1) 


Then the solution x(X9,t) is bounded. Indeed, if this is not 
the case, we can find a sequence {t,} converging’ to? > T <:.e% 
such that 


lim || CX, t,,) |] = GS 


t, oT 


and for the infinitely large function w we obtain 


lim w(x(Xp, t,,)) ; 


= 
t, T 


which contradicts the inequalities (2.1). Thus, the solution 


x(X_,t) is unboundedly extendable (to the right) and 
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sup |Ix(X),t)Il = © . a 
tel 


REMARKS. Condition (2.) of the Theorem will a priori be satisfied, 


e 
if the function v(x,t) is differentiable everywhere over both 


arguments and its total derivative with respect to the solutions 


n 


of (1.1) ais nonpositive on & * I, ‘Ne Cle 
avs, t) = Say) er tt) f(x,t)\ < 0 


This condition can be somewhat weakened to HEQUIsEIN SON syemiektms ts 


hold everywhere outside some bounded set. 


2, LYAPUNOV'S THEOREMS ON STABILITY 


We consider the case where the process is described by the auto- 


nomous system of differential equations 


ax 
dt 


= Tete) ae 2h 2) 
that is, the right sides of the system do not depend on t. We 
assume that f£(0) = 0 and f(x) satisfies a Lipschitz condition 
in some neighborhood G of the PO lta sae Op 

We say that the continuous function v(x) is positive defi- 
ee MOT Me G meee heya O “Geer wahone ons G4 except the point x = 0, 
where v(x) = 0. Similarly, 1f v < 0 everywhere on G, except 
the point x = 0, where v(x) = 0, we say that the function v(x) 
is negative definite on G. If the inequality v 2 Ne OY < 0) 
holds everywhere on G, we say that the function v(x) on G is 
nonnegative (nonpositive). 


We denote by So) He respectively the spherical surface of 
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an n-dimensional sphere centered at the origin, and its interior: 


n 
i} 


toate el sede, 


a 
i} 


(x eB? : |p| <2} 


THEOREM 2.2.2 (Lyapunov's Stability Theorem). If there exists a 
differentiable function V(x) positive definite on G, the total 
derivative of which, calculated through the system (2.2), is non- 
positive on “G, “then the trivial solution” x(t) “= 0° of the sys- 
tem (2.2) is Lyapunov stable. 
Proote= Let = e-2 0, He e Ge We wr2te 

» = min v(x) 

xeS - 

Since the anon v(x) .is continuous and v(0) = 0, one can 
five) © € (OS) So Selb cldehe 


sup v(x) = ho <p 


xeH . 


Let Xo 


this solution intersected the spherical surface Ss. for some 


€ Hs; we consider the solution to (2.2): x = x(Xp,t). Tf 


t=t,, we would obtain v(x(X9,t,)) > d. On the other hand, 


dvCx) 0 
crea (ve CT ETCH G OES) 
Therefore, v(x(X9,t)) is a non-increasing function of t; hence 


v(x(%_,t)) & V(X) 5 < } and, therefore, the trajectory (solu- 


tion) x(Xp,t) does not intersect the surface 5. for any 


tee ea 
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THEOREM 2.2.3 (Lyapunov's Asymptotic Stability Theorem). If 
there exists a differentiable function v(x) pds itive definite 
on G, whose total derivative in t, calculated through the. sys-— 
tem (2.2), is negative definite on Goo thes trivia alee scodutaon 
x(t) =O of the system (2.2) is asymptotically stable. 
Proof. By Theorem 2.2.2, .for any R > 0. there exists r <= CORRS) 
such that the condition Xo € Hy. ¢ G implies that x(Xp,t) <«H,cG 
for any .t = 0.. We show that for.any e¢« «= (0,R) there exists T 
Suchet hat x(X_,t) € H Ore le atom 

Por, ¢€, one can take O0.< 6 <2 min fer] such that 
x(X_,t) € H. forall. t >.0, ‘only if Xo = H.. Let the trajec- 
tory x(Xo,t) not enter He for any t > 0; then 
x(Xq,t) € Hp \ Hs ws 

0( (x, £)) —0 (xq) =| PCED we < 11 <0, 
0 
where 2 = sup (Vx(X), £(x)\. Letting t +>», we obtain 
x<Hp \H. 

11D V(x(X_,t)) < 0. This contradicts the positive definiteness of 
v(x) on Hp c¢ G. Hence we can find T such that x(X,T) € Hs, 
but then the solution will not intersect S. Oia Dy a Cele 
Since e¢ was arbitrarily small, we have Pon x(Xp,t) uy Viel of 

We shall say in the sequel that the functions v(x) that we 
have introduced to justify Stability are Lyapunov functions, and 
the method for proving via these functions is the method of Lyapu- 
nov functions. 

In the Theorems given above the fact that the ohmaketalial GIS) eval 
equilibrium point for the system (2.2) is not essential. Possibly, 


i(a) = 0; in thismecasceswe Say that the Lyapunov function W7i(@S) 


is positive definite if v(a) = 0 and V(x) 3.0! “forlaliy +eay 
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belonging to some neighborhood of the point a. Lyapunov functions 
provide, sufficient conditions for the trivial solution x(t) =a 
to be stable. 

The method of Lyapunov eee ¥s widely used in investiga- 
ting various engineering problems in which the system (2.2) is 
Specified and it is required to choose its parameters to ensure a 
stable equilibrium. In the case of numerical methods of optimiza- 
tion the situation is different: the system (2.2) is to be such 
that the points of the solution of an initial optimization problem 
are asymptotically stable equilibrium points. To prove the nega- 
tive definiteness of a derivative of a Lyapunov function suffi- 
cient conditions for the extremum in the initial problem are usual- 


ly exploited. 2 


3. NON-AUTONOMOUS SYSTEMS 


The results obtained in Subsection 1.2 are extendable to a more 

general case of non-autonomous systems of the form Cl) eee LOmG aes 

end sinsteadeot.s v(x) wthesiunction, v(x ..0 oc crnean where 

Z= God us is introduced. 

DEPINITMON +o dee lemsayve thatuthes functions VGx, tm Se pOsiabave 

definite on Z if there exists a continuous function w(t) de- 

fainedwoneuGey ands such. thatas.0) <wGoes w(x, t)) for oxe7) Op and 

w(0) = v(0,t) = 0. Similarly, the function, v(x,t) is negative 

defanites ons | Zega i there exists a continuous function w(x) on G, 

Such. thatpav( toe cero = Deytor ox? #) 0), and, w(0)s =, vC0, t),=_0. 
We assume that. thersysitem (4.1) isvsuch) thats. £(0,t) =. 0, 

Hi GCxent®) ee COE Then the following two theorems due to A.M. Lya- 


punov hold. 
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THEOREM 2.2.4. If there exists a differentiable function 

WG se) cee CD positive definite on G, having a non-positive 
derivative by virtue of the system (1.1), the trivial solution 
x(a = ON On the system (Ta is Lyapunov stable. 

THEOREM 2.2.5. If there exists a differentiable function 

vi Cae ee Cz positive definite “on” "G,"* admitting an ‘intini- 
tesimal upper limit as X +0 and having a negative definite de- 
rivative in t by virtue of the system (1.1), the trivial solu- 
tion- “ x(t) =O? "or the System (i.1) 4s asymptotically stable. 


The proof is basically the same as that of Theorems 2.2.2 


and? 2°.2.3,- and hence it is omitted. 


4, ASYMPTOTIC STABILITY THEOREMS 

We consider system (1.1), assuming that ECORE 7 2-0" 

ff Cxets)) i on De Here and below in this subsection, Z = Ee xi 
DEFINITION 2.2.2. We say that the function VG, Die cute ad- 
mits an infinitely large lower limit as x-+o if Vat) co 
uniformly’ in t* as, x + ©, /4.e.>°for any “M > 0" there exists 
R= R(M) such that |v(x,t)| > M for all |kl| > R, tap le 
DEFINITION 2.2.3. We say that the function WOR, ie) oD ad- 
mits a strong infinitesimal upper limit as x +0 if there exists 
a function. w(x) continuous on eo such? that Wexs te we) 

©) Tee eae ene Zimmer el In Cl me (COS) a Oe 

THEOREM 2.2.6 (Barbashin-Krassovskij). Suppose there exists a 
differentiable function’ “v(x,ty « ey positive definite on 

Z 


, admitting a strong infinitesimal upper limit as x +0 and 


an infinitely large lower limit as x-+ 0, the derivative ViCx, tt) 
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being, by virtue of the system (1.1), negative definite on JZ. 
hen the warivvel soli Ton <Ct) S70 Morethe system CLL) is elob= 
ally asymptotically stable. 

The proof of this Theorem can be found, for example, in 


Bea Pee Demi doviteh [1]. 


DAE MG AIRON SiitO mii =a GON VER GENCE 


OF NUMERICAL METHODS 


We consider the problem of finding the unconstrained minimum of 
the differentiable function f(x). To solve the problem, Cauchy 
suggested a method involving finding the limit points of the fol- 


lowing system of differentiable equations: 
ae = SC) ; x(0) = =F . (2.37 


The theorems given above enable us to obtain sufficient condi- 
tions for convergence. We assume that there exists at least one 
solution x = x, of the minimization problem. We use first the 


method of Lyapunov functions and introduce the following three 


functions: 


a (Z)=f)—i (am) 2®OM=FIA COP, =o hx. 


If we assume that x, is an isolated point of the local minimum 
of the function f(x), there exists a neighborhood G(x,) in 
which these functions are positive definite. Differentiating 
them through the system (2.3), we obtain 

w=—|fe)P<0, va=—fi (*) fax) Fe), 


Us =<f. (x), Xy—X>. 


If at the point x = x, the sufficient conditions of the 
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minimum given in Theorem 1.3.2 are satisfied, the functions we 


and v will be negative definite at least for x <« G(x,) and 


2 
the Lyapunov theorem on asymptotic stability implies the local 
convergence of the method (2.3) to; the: point x = x,. 

If the function f is strictly convex, then, using Theorem 
ZO We) Ona tneeb bab Vo Se lOxlo =i Cx = “OGIO Ex <EG(xe ie 


The derivative Vv is thus negative definite and, therefore, the 


3 

method (2.3) converges at least locally to the point x = yee 

Now we invoke the Barbashin-Krassovskij theorem. Assuming 
that the function f(x) is strictly convex everywhere on Be we 
see that the minimization problem has a unique solution x = Xy 
and the function V3(x) admits an infinitesimal upper limit as 
So <a eesince V3 (Xx) = 0. Furthermore, by Theorem 1.1.2, V(x) 
is an infinitely large function and hence admits an infinitely 


large lower limit as ||x||+ ©. The derivative wv. is negative 


3 
definite everywhere on EOS hence solutions to (2.3) converge to 


paem PON t= x, globally. The method (2.3) is "relaxation" 
since £(x(x9,t)) is a monotone decreasing function of t. More- 
over, in this method, the norm of the gradient of f(x) and the 
distance between the "current" point x(X_,t) and the minimum 
point x, decrease monotonically. 

If we use the theorem on stability of the first-order appro- 


Ximation, we arrive at the system 


ake 


= =f (ye (Xo ct) fe yet MCL 


If the matrix fox Xx? is positive definite, all the characteris-— 


ULC er OOo Ot sulle minor: tex Sea ALeeGeallyy esitacl Citplaymen eG orks dnycm 
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This implies a local exponential convergence of the method (2.3) 


to the point of local minimum. 

As one can see from the example considered, the application 
of various theorems on Sonar een cen or ,even of Theorem 2.2.3 but 
only with different Lyapunov functions, enables one to make a more 
comprehensive idea about the method in question. Unfortunately, 
Lyapunov functions can be constructed in this simple manner only 
in a few cases. 

Let the function f(x) be everywhere twice differentiable 
and let the matrix tee CK) be everywhere nonsingular. Then we 
can consider a method involving finding limit points of the follow- 
ing problem: 

ax -1 


Stock cee eae ee ree 


To justify the convergence we use the Lyapunov function 
WCx Sh ES(H); f(x). Differentiabing v with respect to (2.4), 


we obtain 


dv 


at i) Sear - v(0O) = (f£,(%9), f.(X)) , 


Abe The method converges thus to 


yielding*in turn *v(t) = v(0) en 
theastavionary pormes OL the function f(x) ast 7 &. 
The method (2.4) is usually referred to as the continuous 
analog of Newton's method which will be described in Section 5. 
Now we consider the problem of finding the minimax Olen Gils Ome) 
We assume that the function F(x,y) is strictly convex-concave 
and has a saddle point [X,,Y x] (see Section 1.5). The simplest 


numerical method for solving this problem consists in finding limit 


points of solutions to the following system: 
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xa dy = 2 
AE a of CF) ; a. Pts) ; (As) 


To prove the convergence we use the positive definite functions 
e 


a (%, =U. (e PFE, 9) Pb 
v(x, =F llx—xe P+] y—el’]. 

If the function F(x,y) is twice differentiable on E"™~x Eas 
and the matrices Pex hey) and oF yt are everywhere posi- 
tive definite, then 

FE (x, y) Pex ¥) Pe (X,Y) + 
Cony) Peale) Baz y) 0, 
the equality sign holds only at the points x = Ags -¥ (Pay oo 

Differentiating Vo by the system (2.5) and using the strict 

convexity-concavity condition plus the inequalities (1.5.16), we 


obtain 


= <F y(t, ¥), ¥e—4>+<F, (x, u), Y=Ya> = 
<F (xs, y)—F (x, y) + F (x, y)—F (x, ys) <0, 


the inequality sign holding if and Oy ae one Stee a) yo el ne 
the Barbashin-Krassovskij theorem, we conclude that in the case 
where the function F(x,y) is differentiable and everywhere con- 
vex-concave, the method (2.5) converges to a unigue solution to 
the problem (1.5.2) globally, 

We consider now a particular case of the problem (1, 5.2). 
where, 7.2 me 1, sFGx y)isexy. Ite is easy to verify that the 
point [0,0] is saddle. The method (2.5) leads to the following 


system: 
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dx 


ax _ Gye. 
dt = =, ; at me x ° (C226) 


Differentiating the positive definite function v(x,y) = ge ae 


by this system, we obtain that v(x,y)*< 0. The system (256) iis 


2 
reduced to the equation ay + y = 0, the solution of which 
dt 


y(t) = Yg cost + X9 Sint 


has no limit as t+”. This example demonstrates that while 
using the method of Lyapunov functions one ought to verify tho- 
roughly whether all of the conditions have beens Satis ted, aero sre 
present case the derivative wv = 0, hence the conditions of 
Theorem 2.2.2, that is, Lyapunov's theorem on Stabiiety aeake 
satisfied, but the condition of Theorem 2.2.3, Lyapunov's theo-, 


rem on asymptotic stability, does not hold. 


6. THE NOTION OF CONVERGENCE 


From the two examples above one sees that for numerical methods it 
is essential that solutions of systems of differential equations 
converge to some set X,. The fact that the point x, is an 
asymptotically stable equilibrium for the system (1.1) implies 
that the solutions of (1.1) converge locally to x,. The converse 
does not, in general, hold. Hence the convergence conditions must 
be weaker than the asymptotic stability conditions. In numerical 
methods, the notion of convergence proper can be interpreted more 
widely than just the tendency of x(X5,t) TO iO TOF xRaAS Lice, 
In this connection we state a few definitions. 


= n 


DEFINITION 2.2.4. The point xekE is) an ~0-limit point of! the 
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Soillwicswoya —5%/( t) to the system (1.1) if there exists a sequence 


Xo) 
{t,} such that Lim X(Xo,t,,) 
w-limit points of the solution x(Xp,t) is said to be the 


=x. The set W(X) oe? Fyibil 


wW-limit set. 
DEFINITIONG2.2.5.6»The method (1.1) converges to the set X, ¢ EC 
locally €or on X, or globally) if W (Xo) =A,  forvany Xo 
belonging to some neighborhood G(X,) (or respectively xX or 
EP), 

The convergence of the method on xX to the set X, implies 
thus that for any XQ € X and any e¢> O there exists T(x9,€) 
Suc hic halbwsc@ sms dle eee 2 T(x, €) we have dis(x(x,,t), Reha <9 Sx 
Let R= EXIT, 
LEMMA 2.2.1. Suppose the system (1.1) admits a differentiable 
function v(x,t), nonnegative on R and such that Its.cotal.de— 
rivative V(x,t), by virtue of the system (1.1), is nonpositive 
on R; and further, that the solution x(X_,t) is extendable as 
(Cao DNS TaN 


@a. for any sequence {t,} the limit * Tim V(x(Xp,t t 


) 
tyro S 


K)? 
is defined; this limit does not depend on the concrete choice of 
the subsequence ita: 
@b. there exists a sequence {tj} such that 
lim ace MCX ith) Pernt) 9 90 
dt Oni awe! 
t.72 
zl 

Proof. <~Sinces, viCx ti) a0! Gee tide ay Gack) 2aOrnon aks the wdmit 
pam V(x(X%_,t), t) = V(X) is defined. Therefore, assertion Gass) 
~»00 


is satisfied for any sequence {t,}. We assume that there is no 
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sequence {t,} for which lim v(x(x),t, 
t47 


fing so > 0 and “Toy > 0, such ‘that ¥(x(X,t)), t + -§ for 


ye t;) = 0. Then one can 


t > T(6). This implies:that_ vV(x(xX9,t), FP) so 9 oblis Wlowisy ey 
tradicts the nonnegativeness of -v(x,t) on R. /// 
Let us introduce two sets: 
G-={xEk*: dis(x, X,) <8}, 
Pee CE dist xe el: 
Assume that the set X, is compact. 
THEOREM 2.2.7 (Yu. G. Evtushenko [9]). Let EsolupTonsiibOm (ld) 
be extendable as t +e and let there exist a continuously dif- 
ferentiable function v(t) on Es SyoWola Aglees WAG) = © aioe 
x ¢« X, and v(x )y><0.)iowalx <iX,3 efurthermore, let the deriva- 
Ea WAG) by virtue of the system (1.1) satisfy these conditions: 
for any ¢ > 0O there exists t(e) such that for aie GS .Ce) 
the conditions 
i t 
p(e, t)= sup v(x)<0, lim ( p(s, s) ds = —oo (20) 
xeE \G, a cs 
are satisfied. Then the method (1.1) converges to the set X, 
globally. 
Proof. We show that for any fixed e¢ > 0, XQ € E” there exists 


Te suichea hat X(X9,t) Ee G. fTOmeal waste ley het Ae S imelin WIG); 
xele 


Since v(x) = 0 everywhere on X,, it is possible to take 
§ « (0.€) So Small that 


(ern) VC) ds ene, 


xel’ . . 


Next we show that for T we can take any value Seipahsrty sieves Geile 
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following two conditions: 
> ee Go.) ae x(Xp,T) € Gs : (228) 


We show that such a value exists. Assume the opposite: for all 
teen tor x(X9,t) ¢ Gs along the. trajectory X(Xq,t) we then 
have 

v(x(x_,t)) - vex(xg;t(6))) <9" f'~ p(s,s) ds 

tO) 

Letting t+ and noting (2.7), we obtain that V(x(X_,t)) > -0 , 
which is impossible since v(x) >°0;" hence we can find -T°*satis= 
fying (2.8). The trajectory x(X9,t) does not leave the set Gs 


LORE 


|v 


T, since otherwise we could find T, > T, Ze Leese amet Dats 


X (Xo, T,)ETs, U(X (Xo, Ty)) <a, 
X(%, T,)ET, Ne <u (x (Xp, T,)). 


At the same time, for T, Sat TS T, we have 


n 
X(Xp,t) Ge FE \ Gs 


By the first condition of (2.7) on this segment of the trajectory: 


sup Ves eee C259) 

xeE"\G, 
Noting that As oN 
ity (2.9). Hence the method (1.1) converges to the set X,. Due 


eo We obtain a contradiction with the inequal- 


to the arbitrariness of the initial point Xg we have proved the 
global convergence. /// 
The assumption concerning the extendability of solutions can 


be removed when we require instead that the conditions of Theorem 
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2c DENSatisttedy themequality Wwex,t)is= wx) =. 0 hold} and 
Pind vathatweenemTunet Lon —s v(xwo) — admits an Inftinitesimal upper 
eLMscte Omens Ox ny ) ve OFS, 

Lier SiMe ene sti e eet oO Bote that the methods do not necessarily 


converge to the equilibrium points. This problem has been studied 


in more detail in Yu.G. Evtushenko and V.G. Zhadan [2]. 


3. THEOREMS ON CONVERGENCE OF ITERATIVE PROCESSES 


1. BASIC DEFINITIONS 


n 


Let a one-to-one mapping T: Ros R be defined everywhere on 


R". We pose the problem of finding fixed points of this mapping, 


i.e., the points belonging to the set 
xX, = {xeR" : T(x) =x} 


To solve this problem one can use a numerical method invol- 


ving iterations through the formula 


x = T(x 


K+ (3.1) 


~ 


We shall call this method in the sequel the method of simple iter- 
ation. We specify an initial point Xg and define uniquely the 
sequence {x, J livyinG@ar Le) 

DEFINITION 2.3.1. The iterations generated by (3.1) converge 
localdya to vherpoinue xy (or on the set Xoc Rae or globally) 

if for any initial point Xo belonging to G(x,) (or respective- 


Mine OE one i) there exists a limit of the sequence {x,J co- 


inciding with the point x,. 
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Only with a few numerical methods the sequences {x} are 
convergent. Nevertheless these methods are widely used, and work 
effectively. We cite, for example, the penalty function method 
(see Chapter 3). Methods of this type possess the following essen- 
tial property: each limit point of the sequence {x,} belongs 
to the set sought, which is why they are referred to as convergent. 
A natural analog of Definition 2.2.5 of the method's convergence 
is 
DEFINITION 2.3.2. The method (3.1) converges to the set X, lo- 
cally (Cor on the set X, “or globally) if for any Xq belonging 
to” Gx) soCor respectively). “xX, or Rey the sequence {x} is 
defined, has a non-empty set of limit points, and each limit point 
belongs to Xy,. . 

In investigating numerical methods the following two ques- 
tions arise: 

@i1. Under what conditions is the convergence of the method 
guaranteed? 
e2. What is the rate of the convergence? 

It is worth noting that the answers to these questions are 
of both theoretical and important practical value: rrequently, 
either the method proper or some stages of the method are con- 
structed on the basis! of the convergence guaranteed; the theoreti- 
cal investigation of the methods is Simultaneously the device for 
creating these methods. Construction of numerical methods for 
solving optimization problems usually involves mappings under 
which the set of fixed points coincides with those points at which 


necessary conditions for the extremum of the given optimization 
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problem are satisfied. Sufficient conditions for the convergence 
of numerical methods for finding fixed points are equivalent or 
almost equivalent to the sufficient conditions for the extremum 
in the optimization or ont , 


In what follows we shall use two types of estimates of the 


rate of convergence. Assume that the iterations generated by (3.1) 


converge to x,. We define 
O det Ky = Xy LORE lee xc epE 
a finite number of the sub- 
SeRipiS me ki: 
; I]. - Xx || 
Bix} ) = BN ASS eT at xy # Xx for all. except 
ee [4 - xy || 
a finite number of the Ssub— 
Sie pity Same ko 
TO otherwise 


Let C(T,x,)° the the set’ of all sequences with the limit x,, 


generated by the iterative process T. Then the quantities 


Qo (X-) = ( sup 5B ({Xp}) 


x4) CC (T, ¥s) 
are said to be the Q-factor of the process T at the point x,. 
Suppose there are two methods Ty and Ts and two Q-fac- 
tors Ong) a5 GG) are known for them, respectively. If 
p< [1,°) is such that mcr) < Or we say that the method 
T, is Q-faster than the method Ty, ate the, point ex,.g iuyas 


not difficult to show that such definition is COLT eC lamella comme 


for some p_ the method T, is eQ-fasterrthan the method To at 


(132) 2. CONVERGENCE THEOREMS AND APPLICATIONS 


the point x,, there exists no other p' « [1,“) at which the 
method To is Q-faster than T,- The notion of ‘the "Q=-faster" 
depends on the norm chosen and it may happen that in one norm the 
process qT, is Q-faster ane in another norm the process Ts is 
Q-faster. 
Concerning the iterative process (3.1) we say that its rate 

of convergence is 

e linear (the process converges at the rate of geometric pro- 
progression) weit Ol < Q, (Xy) <vahe 

e superlinear, if Q, (xy) = 0; 

eo Square, ii 0. Qo (Xy) Ream 
e supersquare, if Qe (xy) = 0. 


In some cases it is possible to obtain another numerical cha- 


racteristic of the convergence rate of LLeCrat dons), 6.21.8 ule num— 
Der jl one) succeeds an proving that for any k =91,2,%.. 
eS Ses Y Pe, = ky (3.2) 


y being constant for all sequences {x,} over the range of con- 
vergence of the iterations (3.1) to the point Ky 
A detailed analysis of Q-factors and definitions of other 


estimates of convergence rates are given in Ortega and Rheinboldt 


fee 


2, THE PRINCIPLE OF CONTRACTION MAPPINGS 


DEFINITION 2.3.3. We say that the mapping T is a contraction, 
OnRECOnUrae pins «on™ X)-Sif there*exists 20% 2 qr <i Fsuchs thar 


for’ anys points Sxzosy (belonging to X) the condition 
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I|T¢x) - Tey) |] < a |lx - yl e3a2 


is Satisfied. The q is Known as the constant of contraction for 
the mapping T. 

A contraction mapping is continuous. Indeed, for any ¢« > O 
one may take 6 = = ; then from the condition ||x-yl|< 6, by 
C33)) At follows that j1T(x)*-— T(y)il =< e€. 

THEOREM 2.3.1 (Banach Fixed Point Principle). Let the mapping T 
be contracting on Res pie ne 

@i. the mapping T has only one fixed point x,; 

e2. the iterations defined by the formula (3.1) converge to 
x, globally; 


@3. the convergence rate is estimated by the inequality 


k , 
lx). - Xx || - a lx, - Xl . (3.4) 
ProoLy lbs tollows trom Cs.o). that 
eer use all eerren i ung, lean 5 I 
k+1 k — k k-l! — al O 
whence for any s > O we have 
| Xpas— Xe] <I] e+ s—Xe+s-il + 
el Xpas-1—Xers—al tf -- + F1%e+1—%S 
<[gtts-2 + gtts-2+ Lee #9*] | x:— x] = (G33) 
k (| —qs gk 
=D in —al< lanl <e 


if k > Ne), where N(e) is sufficiently large. Therefore, the 


sequence {x,} is fundamental and, since the space Ris com= 


plete, the limit 
asain eer 
k+o K 


exists. 
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Passing to the limit’ inhthe equality (3.2) as k > ©, with 
the continuity of T(x) taken into account, we obtain 
ae x = T(tim x),) = em Orne x, =e (x ee Therefore 4k fade the 
e 
fixed point of the mapping T. 


Next we prove uniqueness. If there existed, in addition to 
X,, another fixed point y, we would have 
lly - x, ll = ll TCy) - TCx,) || < ally-x, ||; but this is possible only 
when lly -x, || = Ope iter -ryteraie oiihin: thesinequalitye (3.590 we 


pass to the limit as s +>», we shall obtain the estimate C3 4 en 
iid 

THEOREM 2.3.2. Let the mapping T contracting on X map the 
closedi sets xX) onto itself ((T:xX = X). Then the Ne One oe eto 
only one fixed point Ky aud the iterations definable by (3.1) 
converge linearly to x, on X, and the estimate (3.4) holds. 


The proof is almost the same as that of Theorem 2.3.1. 


3. MONOTONE MAPPINGS 


Let x, y be arbitrary elements of the set xc E®. 
DEFINITION 2.3.4. We say that the mapping V: X + X is: 

e bounded if the image V(M) of each bounded set McxX is 
bounded; 

epotential if there exists a differentiable function tl) 
such that V(x) is its gradient; the function f is then called 
the potential of the mapping V(x); 

emonotone if 


Ope = (V(x) - Vy), x-y) Ve xe -yaouex 


? 
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euniformly monotone if there exists m (monotonicity constant) 
such that 
mik=yl; S AAMayavGd.tx-ye PY xy ex s "(3r6) 
; , 
e Lipschitz (with constant 2&) if 


Vix) -Vey) I] < 2lk-yll Vi ok (Sa) 


If the mapping V is simultaneously uniformly monotone and 


Lipschitz, the conditions (3.6) and (3.7) imply that m < &. 


n 


THEOREM 2.3.3. Let the mapping V: EX +E be Lipschitz with 


constant & and uniformly monotone with monotonicity constant nm. 


Then the mapping T: Es. ne defined by 

TO) ee VICK) We (3.8), 
: : 2m 
sy “COpninierowalaves Sioned ets 1 (Oy 72 and 


tren == Try) < alk=yil. 


where 


Coe =e Glam 9272)? el 


ProO tee lOrlean Vax. Yeae save 


| T (x)—T (y) P? =]x—wW (x)—y + W (y) P = 
= [x—yPP—2t<V (x)—V(y), x—p + VV (y)—V (x) P< 
<|x—yP—2um]x—yP +0 |x—y P=? |x—yP. /// 
In constructing numerical methods it is expedient to do one's 
best to ensure the highest rate of convergence. Theorem 2.3.3 


allows us to find the best value 1, starting from the minimiza- 


tion of estimation of the relation |TCx) - TCy) II/ I] x-y Il The 
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minimum value of q with respect to the parameter Tt is attained 


tOr ata 3 and is 


j 2 Geslaaeala 


4, CONVERGENCE OF THE FIRST ORDER APPROXIMATION 





(US WESENE INCI WME CHLECREGS AuieMLoe Chi Waeorciy Ao ilcil Om Sishodlsiny Or 
the first-order approximation. First we prove the following lemma. 
LEMMA 2.3.1. Suppose there exists a neighborhood GUx.) Of, the 
fixed point x, of the mapping .T, such that for any x « G(x, ) 
the conditions 


TG) x qllx-x,|| , Gece 


are satisfied. Then the iterations defined by (3.1) converge lin- 


early locally to the point x,. 


Proof. We have the inequalities 
x4 - xx Il a x6) = x ll Se []X_ - xx || 


Hence it follows from the condition xp < G(x) by induction that 


all the points x, belong to G(x,) and satisfy the condition 


il ene ee Woes ed ait ae 


hence sel Ky = Xy- aja 


a be differentiable at 


THEOREM 2.3.4. Let the mapping T: R° +R 
the fixed point x, and let the spectral radius S of the matrix 
Tx satisfy the condition S < 1; then the iterations defined 


by (3.1) converge linearly locally to as 


Jy Cie rm Tse ee (3.9) 
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Proof. Using the differentiability of the mapping TT, we obtain 


t 
tC) 3 fo [TT¢x,) + T (xy) (x-xy) + (x-x,)TOCX,, x-x,) - x, || . 
C30) 
Here the vector function 6° is such that 


Lim i )OCx, kext = Oa. (3,11) 
X>K y 
Hence for any ¢ > O there exists a neighborhood G(x,) of the 


point x,, such that for all x « G(x,) we have 
hae X-x,) || s € 


As is well known, for any ¢ > O we can choose a norm 
of the matrix cr. such that Te Cx) || < e + S. Noting that 
TSS) ama ove obtain from (3.10) that for any x « G(x,) "we é 
have the inequality ||T(x) -x,|| < (2e+S)||x-x,]||. By the hypothe- 
sis of the Theorem, S < 1; hence e¢ can be regarded so small 
that 2e +S < I, and the conditions of Lemma 2.3.1 are satisfied 


implying the local convergence of {x,, } to xX,- 


From (3.10) it follows that 
iy 
NPC) = ql < Uxexy ll [TZ I+ lee, x0 Il 


Using this inequality for each element of the sequence {x,} gen- 


k 
erated by (3.1), and noting the property (3.11), we obtain 


lim Les %ell — tim [TF (x4) |+I@ (Xe, Xj —%e) [JHITZ (x0) | 
XprXs 


XpoXe I Xp—Xe I 


ije., ©or the aQ-factorgthe estimate, (3.9) olds. ./// 


We note that a sufficient condition for the iterations (3.1) 
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to converge is that a norm of the matrix Ta (x4) be less than 
unity, since this will imply that all eigenvalues of the matrix 
Ce) areskesss than unity rand. Henetore Sle 
It follows from Ties ae Waekie. mbar Tex = O the con- 
VEreencemrace OL tne a teratlvons Gs.) is) superilinear. se lhestace 
that the spectral radius of the matrix Tia) is zero does not 
guarantee superlinear rate of convergence. 
THEOREM 2.3.5. Let the mapping T Ro aR be differentiable at 
the fixed point xy, , the matrix J Cyd being symmetric. Then 
a sufficient condition for the iterations (3.1) to converge local- 
ly to the point x, is that the largest eigenvalue k« of the 
matrix ThCxy) be less than one. The corresponding convergence 
rate Q, (x,y) = K. 
Proof. From the symmetry of the matrix TL (xy) iu LOMVowse that 
its eigenvalues are real. Let us use Theorem 2.3.4. We assume 
that the norm of the matrix T(x) is the operator norm. Then 
the spectral radius of the matrix TL (xy) coincides with its norm 
aS well as with the largest eigenvalue «. From the condition 
|k| < 1 it follows that the conditions of Theorem 2.3.4 guaran-— 
teeing the convergence are satisfied. The symmetry of the matrix 
T(x) holds, in particular, if the mapping “T \is poten tila dis/ 7/7 
The estimates of the converge rate given by Theorem 2.3.4 
and Theorem 2.3.5 are of asymptotic nature. An estimate of the 
(3.2)-type is given by: 


n 


THEOREM 2.3.6. Let the mapping T: Re = be differentiable in 


some neighborhood G(x,) of the fixed point x, and let 
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a 
yY = sup TCX) || <4 
xeG(x,) 
Then the iterations defined by (3.1) converge locally to xX, and 
the estimate of the convergence rate ” 
Hae X all < vil sell = FRG = 25 I (3.12) 
Tec Ha oe ee Me ie 


holds. 

Proof. From the fact that the spectral radius of the matrix does 
not exceed any of its norms it follows that the spectral radius 

of the matrix Ta (xy) is less than unity and therefore the itera- 
tions converge locally to the point x,. 


Now we eStimate the norm 
i> Sell = A EGR = Ty © 2 


Using the Lagrange formula for mappings (see Appendix 1), we 
obtain 
|x,—*e | ee: : | 72 (%e+-9 (Xg-1 —%e)) +] Xe-1 —%ell < 
<y|*,-1—*el|, 
Veen canta oC Sim 2)) ie e/ 8) 7/) 
5. THE CONNECTION BETWEEN THE CONVERGENCE OF DISCRETE 


PROCESSES AND THE CONVERGENCE OF CONTINUOUS PROCESSES 


In Section 2.1 we obtained sufficient conditions for exponential 


Stability of the trivial?solution— x(t)©= x, for systems of the 
form 
xan > 1 
at ad F(x) ; F(x,) 0 f (or 3) 


The analysis involved investigating roots of the equation 
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a 
[FC xy) - AI | =e Oa (3,14) 


Integrating (3.13) using the Euler diagram, we arrive at the fol- 


lowing iterative process: . 


rad tke DOERR? Oy Th Mote LK) eae am on 


By Theorem 2.3.4, the question whether the process (3.15) 


has a solution can be answered by analyzing the equation 
a - ah i 
Le eset lea hike Cag eye re tem 


Transforming it to the form 


a i me 

FL Cxy) + = i = O 
and comparing the result with (3.14), we conclude that the roots 
of these equations are related in a simple way: w= 1+ ad. 


We express the complex root i as } = a+ ib; then 


= oat tone me = “Gl Boe ap abe 
The condition |u| < 1 can be written as 
Vi CO)N =) eee Oat a2(a2 + b*) <r) 


ia Cm eae eee OL mt Ty (a) ean seme © tem cl) yen O mmc ay, where 





2 Re i 
Ob > = _— 
2 
|| 
Let Ageceeohy DeeCOOusSmo teLGua ton Gs 4) panic nent hemeen dit 


tion |u| > 1 holds for any a satisfying the condition 
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0<a<a=2 min [ae |- 
té[1:n] 
We have arrived at the following theorem. 
THEOREM 2.3.7. Let the sufficient conditions of Theorem 2.1.1 on 
stability of the first-order approximation be satisfied for the 


System (3.13) at x Then there exists a such that for any 


me 
fixed 0 <a< a the iterations defined by (3.15) converge lin- 
early locally to the point x,. 

In a particular case where the system (3.13) realizes the 
Cauchy method (2.3) of unconstrained minimization of the function 


Cx) paethesmat rix FL (x,y) = Eg (he): Hence all AG are real. 


If the matrix f 46 Xa) is positive definite, all AG = OO} and 





= : A; F 1 2 2 
@ == 2 Mile) ae ee 2iminy | — |, 
iée[1:nJ | Ai | Pe pl snlyte mmc n 


where mn denotes the largest eigenvalue of the matrix foe 

If finding limit points of solutions of the Cauchy problem 
for the system (3.13) is interpreted as a numerical method for 
finding an equilibrium point for the system (3.13), then (3.15) is 
a discrete analog of this method. It follows from this Theorem 
that the justification of exponential Sita bale yOt mequaslab rasumet © 1 
the system (3.13) automatically implies the local linear conver- 
gence of the discrete version of the method for a sufficiently 
small step of integration. This procedure is widely used to justi- 
fy the convergence. 

It is possible to integrate the system (3.13) using higher- 
order approximation, for instance, the Euler method, the conver- 


sion method, the Runge-Kutta method, etc. However, this does not 
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increase the convergence rate but instead makes these methods more 
cumbersome. This is due to the fact that the application of the 
high-order schemes does make it possible to "track" more precisely 
each solution of the system*(3.13): although there is no need to 
do so because of the asymptotic stability of x= x,- Hence in 
what follows, going over to discrete approximations of numerical 


methods, we shall use only the simplest Euler polygon. 


6. THE APPLICATION TO THE UNCONSTRAINED MINIMIZATION PROBLEM 


We pose now the problem of finding the unconstrained minimum of 
the function f(x). By Theorem iio .On Or a convex differentia 
Dleefunction f£* this problem is equivalent to that of finding a 
stationary point satisfying the condition £ (X,) = 0. From Theo- 
rem 1.2.7 we obtain that for the convex function f the mapping 


f(x) is monotone. We introduce the mapping 
Cx) =p xe CS Hb KE tf (x) 


We shall seek its fixed points via the method of Simple iteration 
Csr 


See tf (x, ) : C3. 165 


Assume that there exist 2 and m_ such that for any x,y « E" 
the conditions 


Hie) = flv allie soe x- yh 
milx-y|P < iret va 


are satisfied. The mapping V is therefore a uniformly monotone, 


Lipschitz potential. It follows from Theorem 2.3.3 that the pro- 
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blem (1.3.4) has a unique solution x = x the iterations given 


*? 


by the formula (3.16) converge to, x, globally for any step T 


en and the estimate of the 
Ven) 


convergence rate (3.4) holds, where 


Satisfying the condition tT « (0, 


2 
q = (i= “2mr) + arte ert. 


It is possible to justify the local convergence of the itera- 
tions (3.16), using no convexity condition for the function f. 
Instead we require that at the stationary point x, the sutrti-— 
cient condition for a local minimum should be satisfied (see Theo- 
etme 2 me Vel Ze eG Me beta Om Mm asbuie xeeE (x, ) is positive definite. 


XX 


We consider two characteristic equations 


[a ed ae | =O), 7 
PPG) cle = atte fe) i) = OO, 
the roots of these equations being related through K = 1 —- TN. 


We arrange eigenvalues of the matrix ff Xe) in increasing order: 


O's eS ln sesh Fie They are in correspondence with eigenval- 


ues of the matrix Thx): 


Pee Te Ps =e ems CT] 


teh? a oe a ET) ee aS 


dl 2 


By; Theorem 2.3.5 a sufficient: condition for the iterations (3.16) 
to converge is that Ka 22-1, “The latter holds for any 0 <"?t < 7, 
where T = i It is appropriate to choose Tt from this interval 
such that the spectral radius S(T) of the matrix T(Xy) be 


minimal; hence we seek 
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min S(t)= min max[|1—vt,|, |1—7,|]- 
0<t<Tt 0<t<T 


On the interval (0,17) the linear function 1 —- Ty changes from 


1 to -1. Hence the minimum of the function S(t) will be at— 
6 
: : 7 : = 2 
tained with 1 - oi eae -1 + ™T4> LiCl Ole te a= 7th, The 


spectral radius of the matrix T(x) is 





ee melt 
P iheck een 


If the function f(x) is differentiable in the neighborhood G(x,) 


n 


of the point x, and for all x. ¢ G(x), ZekE the inequality 


ali 


2 72 
nqlelle> xe te (ie) pen] 


holds, then, by Theorem 2.3.6, the estimate 
ey Sr sea ik, sel 
k * = * O * 
holds for the convergence rate. 


7, AN AUXILIARY RESULT 


In the sequel, in Chapter 3, we shall need the following lemma. 
LEMMA 2.3.2. Let the mapping T(x): E’ + E" be such that 
T(O) = O and let the inequality ||TCx) - T(z) || < el]x- zp, where 


GE ANS Gh inkthilereng, eX yilioh Weim Ey 5, AS E". Then the equation 





x = eh Gxee ya) G32 75) 

for each vector i i iti v2 —-1 
y satisfying the condition |ly|| < Be AS a 

unique solution in the sphere H = {xe E™ : |[x|]< 2ellyl|?}. 


Proof. We show that the mapping T(x+y) maps H onto itself. 


Let. x e€ H. Then 
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IT(«+y)I<clx+yP <cl[lxP+2|x]-lyl+lyPl< 
<clylP [4c] yP?+4cly|+1]<2cly. 


Thus; the.point  T(x+y)_€°H.. We! show next ‘that the mapping 


T(xty)) “is contracting by, Hi” Indeed? Af; x,/2.6°H, xf oz, . then 


’ 


IxJ<2clyP, |zI<2c]yP, 
IT (x+y)—T (z+y)|<clx—zP< 
<elx—z] (lx|+]2) <4c?|x—z]lyP< 
< (V2—1)*fx—2] <x —zi. 
By Theorem 2.3.2 the mapping T(xt+ty) has only one fixed point 


corresponding to a solution of Equation (3.17). /// 


4. CONVERGENCE OF PROCESSES GENERATED BY MULTIVALUED MAPPINGS 


1. PRELIMINARY RESULTS , 


n 
Let a point-set mapping W: Ee oF be defined everywhere on igs 


(see Appendix III). We pose the problem of finding the set of all 


fixed points of the mapping W: 


Ry = fk eB: xe W(x} 


We assume that this set is not empty. A natural generaliza- 
tion of the method of simple iteration (3.1) to multivalued map- 
pings is the process: 


Se inact be (4.1) 


k+1 


In contrast to Coty, given an initial point XQ we define 
nonuniquely the sequence of points Ky, Xq, sees which we denote 
here as before by {x}. 


If for any sequence {x,} obtained from (4.1) the set of 
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limit points is not empty, each convergent subsequence converges 
to a point belonging to some set Z, we say that the method (4.1) 
converges to the set Z (globally if this property holds for 


e 
NI eee E". or on XxX if this property holds for ali Xa ISS 


O 


or is local if this property holds for all Xo belonging to some 
neighborhood of the set 2Z). 
We denote the sequence obtained from (4.1) by {xX}, where 


W = {0,1,2,...} is the set of nonnegative integers. We denote 


by A+ an infinite set of nonnegative integers in their natural 


order, for example, ies = Hao feclsen se Shleop. Weadenotcmny 


{x 


ct at the subsequence of the given sequence {x,}, correspond- 
ing to the set A>. The notation A> ¢ A. means that Ay is an 


infinite subset of A. The notation {x, } is the subsequence 


ele 
obtained from {x}, when 1 has been oa to each ke A. 

The justification of the convergence of the processes consi- 
dered is based on the following lemma similar to Lemma 2.2.1. 
DoMMA Ae ct as Continuous s nunc clOnmviCx DO mdecid nedmOnes rie 
set X cE", and let the sequence of points {x,}, of X be 


SuCherthatmeto many ek en them CON cis Gaon v(x, 44) ae is sat- 


isfied; also, let a subsequence {x } 1? ie ¢ A, converging to 
A 


thevpeint "x, <« X “exist. “Then 
iain Ce) lim 7G) EN (Ee) 
keo,kej  § koo,keql  & i 
Proot.) [he continuity of 8vCx)) simples 
lim v(x,) = Vite? ; (223) 


ee aia 
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By the conditions of the Lemma, the sequence {v(x, )} decreases 
monotonically, hence for any k > 0 we have v(Xo) > v(x, ) > vx). 
We use (4.2) as well as the definition of the limit. For any ¢e > 0 


there exists N « A? such that for any? i> Ni? andio Kice A” 
dy 


OS< v(x,) =AVCka) eS V(X) = v(x,) oS" sek 


At the same time for any- 7i.c<es\,,wa >-N, » the: condition 
v(x, ) < v(xy) is satisfied. Hence for all x, © {x} 4 such 


that i > N, we have 


Iv (x, ) a Br Ml = E ? 
proving in turn the Lemma. /// 
2, FEJER MAPS 
| n ED ; 
DEBINLTIONG 2.4... Che jpoint—setemappine VWs ho = 25. havanewa 


non-empty set of fixed points X,, is called RK, -Fejéer Ice the 
following two conditions are satisfied: 

Oey ees ORV Se eee 

e2. |[z-yl] < [|x-yll vy ¢ X,, Wx éX,, Va © W(x). 

The simplest example of the Xx -Fejer mapping is the opera- 
POM OL pLOjeeuLon onto, the compact convex set X,, considered in 
Sections CseemrormulLam Glial. 2))) a ine bhismeases 
W(x )al =OorAnes min ® |x y || 
y ex, 

THEOREM 2.4.1 (1.1. Eremin). Let the point-set mapping W(x) be 
closed and X,-Fejér. Then for any fixed point Xo each sequence 
{x, } defined from (4.1), converges to the point x, belonging to 


X, and depending only on the choice of Xo: 
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Proof. .Let, y bean arbitrary, fixedspointior © Xoo. Let 

v(x) =\||x-y||. Also, we fixethe initial point Xq- The sequence 
{x,} is bounded since for all k the inequality ||x, -y]| < IlX9 - Il 
holds. Therefore, the set of all limit points of the sequence 

{x,,} is not empty. Let p and q _ be two limit points. The 


sequence {v(x,)} decreases monotonically and is bounded below 


by zero. Hence, according to Lemma 2.4.1, 
v(p) = v(q) = |lp-yll = |la-yll . (4.3) 


it p © 4, OF. g = 3% then necessarily p= q:, Indeed, iet 
bp © 4,35 (then as y we take p,.) from (4.3) we obtain 
He Ons llG i, ate pe = de 

Let the subsequence ead converse sais) (1k > io Stomanpount 
p # X,. We consider the subsequence Pe dag its boundedness 
implies that we can separate a subsequence corresponding to the 


index set ne € eres such that 


lim x = q 
2 «k 
k>0, keh” 
If we subtract unity from each element of the set oe we obtain 
the set ie S ae with lim x =p. By the closure of 
s) is 
k>w, keh 


the multivalued mapping W(x) we have q « W(p). Hence, taking 
an arbitrary vector y « X,, we obtain ||q-yl]| < ||p-yl|. But 
this inequality contradicts (4.3). Whence we conclude that the 


sequence {x} converges to a point belonging to the set X,. /// 


3. THE APPLICATION OF THE THEOREM ON FEJER MAPS 


We consider the problem of finding a point of the set 
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KL Awien™ tht (z)— 0, J © [4re]) 


We assume that each of the functions hd (x) is convex and defined 


everywhere on E". It then £ollows from Theorem 1.1.6 that the 
Cc s i 
UN Cit OT) MS.) es ee) is convex as well. For each point 
van 
x ¢ EY we define the mutlivalued mapping 


AS (x) 


vejna—| Tree" ae 


al Geer 


(4.4) 


Here@ wel Sean i bes er mms (Oo) sce cSiCx) sm CSee™ Dek tnaatd OlMmlneanils)e 
THEOREM 2.4.2 (I.I. Eremin). Let  X, # 9, and let the multivalued 
mapping W(x) be given by (4.4), where O < i < 2. Then each 


global- 


7 


sequence {x, } defined by (4.1) converges to the set X, 


aaa 
Proof. We show that the mapping W is Wire jor. The justifica- 


tion of the closure follows from Theorem 1.2.4. Let x¢X,; 


we show that ||H(x)|| # 0. We assume the opposite is true: 
ance, SS De Naweyey, leh? alee) atowereile, (allo A)), st@e Eley ay GS EB" the 
INMedualLty ox) seoCy) seholds= sbutmthis contradvuets the fact that 


x ¢X,. Hence (||H(x)i| # 0 everywhere outside X Momma pile ie y, 


ee 


Yue Xan s £45, 25 e W(x) we have 


AS 
B=|2—yP=|x—y rapt] = 
20S Ats3 
=|x— b+ aor <ff (x), Y—*>+ THE 


Using the inequality (1.2.2), we obtain 


2aS AS 
<|x— ai mae a (S(y)— beter 


_AS* (x) x 
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Therefore, for any 0 <} <2 the multivalued mapping W(x) is 
X,-Fejér and, by Theorem 2.4.1, the iterations (4.1) converge to 
Nate) 

A more detailed discusssion of the theory of Fejér maps and 
their applications to solution of applied problems can be found 


Ula Ee MA He enClee Vice mm Niaz Came ellie 


4. NONSTATIONARY PROCESSES 


Let the process (4.1) be defined by the nonstationary multivalued 
mapping W(x,k): 


ce 2 


We pose the problem of finding the points 


oe ee Ga xk a Wx kk 


To illustrate such process, we cite the method of a general- 
ized gradient for finding an unconstrained minimum of the function 


(CxS) Pel enw. Gin 
Xeyy = X, — 0, 7, HCx,) , Hx) Ce otitx) "3 (4.5) 


Here > Y, are some positive coefficients; various laws of 


their variation are feasible -- we shall give three of them: 
malt (pl, Say = 00, Sat<oo, (4.6) 
Me=[4+1 (PI, Soy =00, +0, (4.7) 
Y, = const, s A,=00, &,-—+0, (4.8) 


WMaxeroe: 0) << Gl Ske Buclouhineeey ¢ 
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Wes GeStmilchMOUrSelVves COs Lem ust Liaicatdons Ot abies methodudc4./5) 
with the regulation rule (4.6). 


THEOREM 2.4.3). Let the function f(x) be defined everywhere on 


n 


> 


E be convex, and let the set of its,minima X be nonempty. 


* 


Then, using the method of the generalized gradient (4.5) with the 


} a sub-— 


regulation (4.6), we can separate from the sequence {x A 


k 
sequence {x, } such that 
k at 


lim FO) =F (%), te EXs. (4.9) 


k—>o, 


2 
Proof Se Letis xjlerks5 vee lx, - xy ; then 


Uped = 0_— 20 4Pp <H (Xp), Xp—Xe> + ave A (x,)? , 


implying in turn that 


‘ k k - 
Uppy = Up — 2 x asVs <H (a). A-ha? ee Gels || 1 (x,) |. (4. 10) 


n 


By the convexity of f(x) on E we have O < (H(x), ox Hence 


the relations 
“TH (x4) : 
2 
Oct Dd | eet | hs SU + ao 
s=0 ei s=0 


follow from (4.10) if we take into account (4.6). 


From the boundedness of the right side by a constant not 
depending on k, it follows that as k +o the sequences {v, }, 


{x, }, {H(x, )} are bounded. From (4.10) we obtain also 
k k 
2a » as <H (x); X,—X»> SG pea x Cas 
$=0 s= 


where the constant a does not depend on k. The right-hand side 


(152) 2, CONVERGENCE THEOREMS AND APPLICATIONS 


Of the anequality=is bounded) as) kes on "oye C4. 6) x eae but 
s=0 


this is possible only if (H(x,), x, — Xy) > 0O as k+o, From the 
convexity property 
e 
f(x) << tos (H(x,), X, — Xy\ 
we conclude that there exists a subsequence {x, } 1 such that the 
A 
condition (4.9) is satistied, “//7/ 
The theorems are similar and the convergence of the method 
under the regulations (4.7) and (4.8) can be proved similarly. 
The method of the generalized gradient is extensively studied in 
the literature; we refer, for example, to Yu.M. Ermol'ev [2] and 


NZ. Shor [1]. 


5. METHODS FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS 


1. A METHOD OF SIMPLE ITERATION 


et the mappinis, V: R” +R". We pose the problem of solving a 


system of n equations with n unknowns 
ViGs) ee = Omer Gos) 


Using the formula (3.8), we define the mapping i(Co)e thens the 
problem considered becomes equivalent to the problem of finding 
fixed points of the mapping T. The method of simple iteration 


(3.1) leads us to the following method for solving the system Cons): 


Xa T(x, ) Se ae TV(X,) : (5% 23) 


From Theorems 2.3.1 and 223.3" we obtain the following re- 


sult. 
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THEOREM 2.5.1. Let the mapping V: R” + R” be Lipschitz with 
constant & and uniformly monotone with monotonicity constant mM. 


Then Equation (5.1) has a unique solution x= x and for any 


*? 


2 5 : 5 ; 
TE fo, 22 the iterations defined by, (5.2) converge to Xy 


globally. We have the following estimate of the convergence rate: 


k 


= Zoe NV(xg)lI 


T 
Sco sox fod 


Boxy || 


If the solution x ="'x, “to ‘the system’ (501) exists’ and the map- 
ping ™is diftierentilable atthe! point  x,, - then, ‘by Theorem’ 2. 324 
for the local, linear convergence of the iterations (5.2) it is 


sufficient that the spectral radius of the matrix 
oly 


= T ; 
TC) = I, - TV Cy) be strictly less than one. 
Let us consider two equations: 2 
[Ve (“)—=nl av, 
[TE (%.) —x/,,| =|—tVE (x,) + (1 —x) I,|=0. 
The roots of these equations are related in a simple way: 
Kone en wld teat he complex=roOoOu. 7) as anes ae be chen 
Ri ef lee rave 1 tbe, lic |? = (1- ta)? + ol 
The condition |k«| < 1 becomes then 
(an heb" 07) ao 218 ait, pO ae (5.3) 


If 9a = Re 4 > 0. €5.2) holds for any .0)< 7 < t, where 


ie SS RS a ~ ©, (SS) Ws Seieitsieeic! sone Ehalhy 





Therefore, the spectral radius of the matrix Tom is less 


than one iff one of the following conditions is satisfied: 
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f : T 
@1. the eigenvalues Ny cees me of the matrix VOX) 
satisfy the conditions Re "4 On ce ee lee ee TG 
2Ren, 
O< tT < — 3, i «€ [1:n]; 
a e 
Ing | a 
@2. the eigenvalues Nyro ceeo of, the matrix /V_(x,) 
n x 
satisfy the conditions Re o, * OF Ieee loin ena Tid 
2 Re ny 
—_=- < T< 0, jie Aas 
2 
In. | 
a 


When either condition is satisfied the iterations (5.2) con- 
verge locally to the point x,- if the-matrix Vey) is symmet-— 
fl Cem le bine er OOS ny will be real,.and, the conditions (1.). and 
(2.) will be satisfied only if the matrix Vix) is either 
positive definite or negative definite. 

If the mapping V(x) is potential and the potential is given 
by the function f(x), the method (5.2) can be regarded as a 
numerical method for finding local extrema of the function (see 
A case of finding the minimum was considered in Subsection Zor 


the method (5.2) coincides in this case with (C335 HUEY 


2, AN ANALOG OF THE METHOD OF THE GENERALIZED GRADIENT 


As was pointed out in Subsection 2.3.6, methods of minimizing con- 
vex functions can be treated as methods for finding fixed points 
of the corresponding monotone mappings. Hence the method of the 
generalized gradient described in Section 2.4 is equivalent to 
the iteration 


x = x 


ead ee Se Ve Oe (5.4) 


where the nonnegative coefficients Oa Gee a) according to one 


of the rules (4.6)- (4.8). Since the proof of the convergence 
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repeats almost verbatim the proof of Theorem 2.4.3, we restrict 
ourselves to stating the theorem on convergence. 

THEOREM 2.5.2. Let the,solution x = x, to the system (5.1) 
exist. If the mapping V(x). is bounded and uniformly monotone 
and the step-length rule (4.6) is used, the iterations defined by 


(5.4) converge to the point x globally. 


* 


3. NEWTON'S METHOD 


A most widely used method for solving the system (5.1) is Newton's 


method in which iterations are given by the formulas 


Ses ee T(x, ) , TCS) t= ok i= [Ve (x) 1 V(x) , Be) 


i.e., the method of simple iteration for finding fixed points of 
2 
the mapping. TCx)..If the mapping -V is sufficiently.smooth, 


then 


T (x) =[Ve (*)]-? > [Vir (x)] V (x). 


Therefore, T(x) =.0,- which by.Theorem,2.3.4,,ensures. a high 
rate of convergence X, 7 Xx: 
Basic properties of the Newton method are reflected in the 


following theorem. 


2 be differentiable in 


THEOREM 2.5.3. Let the mapping V: as 
some neighborhood G(x,) of the solution x, of the system (5.1), 
the derivative vee being continuous at x, and the matrix 
VC X4) being nonsingular. Then the iterations defined by the 


formulas (5.5) converge locally to the point x, at a superlinear 


rate. If, furthermore, there exists a constant % such that 


(156) 2. CONVERGENCE THEOREMS AND APPLICATIONS 


5) a Wy Dalle eggtllzer Rath e totter Ghee) 3 (5.6) 


the iterations (5.5) possess the local quadratic rate of conver- 
gence. . 
This assertion will obviously follow from Theorem 2.5.4 
below. 
REMARK. In what follows Theorem 2.5.3 will frequently be applied 
to various specific problems. To simplify the formulations we 
shall write instead of the condition (5.6): the derivative Ve) 
satisfies a Lipschitz condition in the neighborhood of the point 


PCV Cy J) Be Syl WoRgy te Gls) 


In the subsequent chapters, the investigation of the method 
(5.5) is preceded by the justification of the method of Simple iter- 
ation (5.2) with t= 41. In justifying the method we prove that 
the spectral radius of the matrix a. - Vi (Xy) LSnSt GEG yeless 
than one. Neumann's lemma (see Appendix II) implies then the 
nonsingularity of the matrix Va (X43 and to ensure the quadratic 
convergence rate (5.5) it is required only that either (5.6) be 
satisfied or the Lipschitz condition be satisfied in the neighbor- 


hood of Xy- 


4, MODIFICATIONS OF NEWTON'S METHOD 


A high rate of convergence of the Newton method explains its wide 
application to solving various applied problems. At the same time, 


however, the method has a few shortcomings; the most essential 


ones are the following: 
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@1i. time-consuming computations because of the need to com- 
pute n? partial derivatives; 

e2. the local nature of convergence. If‘the initial approxi- 
mation is poor, the method Biten diverges; 


@®3. the need to solve the system of linear equations 


VECaLODy. = VC) in order to construct the Newton direction 
3 


Py 
which requires arithmetic operations of order n 

Many works of the recent years have been aimed at elimination 
of these shortcomings, and aS a result, many modifications of the 
method) have’ been created. We pete discuss some of them. 

In a numerical realization of the method, the finite differ- 
ence approximation of the matrix VCs) is usually used instead 
of the matrix V(x). In this case the question arises: How does 
one choose the size of the step of numerical differentiation and 
what is the differentiation scheme to be so that the method pre- 
serves its high rate of convergence? The theoretical feasibility 
to use a finite difference approximation follows from the next 
theorem. 


n pe differentiable in 


THEOREM 2.5.4. Let the mapping V: R” +R 
some neighborhood G(x,) of the solution x, to the system (9.1), 
let the derivative Oe) be continuous in x, and let the ma- 
ic eax V(X) be nonsingular; define the matrix W(x): 

lim ||W(x) - Vee SEBO 2 CSaw) 


aa 


Then the sequence 


Be ee ae Te), =e xe twe¢xy yo tv¢x) , (5.8) 
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converges locally to the point xj Pateravsuperlinearsrate. If} ah 


addition, the inequality (5.6) is satisfied and 
Wrox) - Wool] < ellvox ll, (5.9) 


the iterations (5.8) possess a local quadratic rate of convergence. 


Proof. We show first that the mapping T(x) is differentiable at 


x, and Te) = 0. This is equivalent to the assertion 
IE (x)—T (ee) 
ene eee oo 


We have the estimates 
[7 (x) —T (x) |=e—2.—[W? (OV I< 
< LW? (x)]7 I WF (x) (2x) —V (x) J < 
<[W? (x)]-* | (|W? (x) VE (x) | |x— xe + AS 
+I VE (x) (x—x.)—V (x)]). 


From (5.7) and the continuity of the derivative Vat) at the 


ponte ix follows 


Him (17 HVE Ce), 


fin Os) VO Goalie 6 


’ 
xX, (i—x, I 


whence, taken (5.7) and (5.11). into account, we obtain (5.10) and 


by Theorem 2.3.4 the superlinear rate of convergence x, > x 


k * 


holds. 
Let the inequalities (5.6) and (5.9) be satisfied. Then from 


(5.11) we have 


[441 —Xel] =[T (x,)—T (x) |< 
[WP (x) VC LV (XQ) [ll p24 | 20 x, — Xe ]?) 


implying the quadratic rate of convergence, since 
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; | Xr+1—%« || 7 ={ : | V (xp) 
lim eal <lim GW? (x,)] (lim ( \+2I\< 


bono, (Xe |? wo \{l Xe] 


< [Vi (%e) 77 | (CVE (a) | 22). 


As the simplest finite difference approximation we can take 


the matrix W(x) in which the His row is 


() ef) — -\]T 
geneva) See he eel ON “a Zea (5.12) 
Applying Theorem 2.5.4, we can show that if noe = OAs 
k + © (1 <i <n), the sequence (5.8) involving the approxima- 
tion (9.12) converges locally to x, at a superlinear rate. If; 
in addition, the inequality (5.6) is satisfied and 


eel < ellV(x,) |I, the rate of convergence is quadratic. 


To enlarge the domain of convergence of the method, a special 


control of the step length Oy) is usually carried out on each mi 
iteration 
ee tp p, = -{V.(x,)]7 V(x) (5.13) 
k+1 k kako k Soe mek: k : d 


Ime thelvOLrksO tei mek me boae yn an GmVen Ve SON nis (1a) Vice Ni a luebecden, 
fil], &. Gleyzal [1], C. Haselgrove [1], and in many others, it has 
been suggested to reduce in (5.13) the number Oy. beginning with 
a, = 1 until the condition HV Cx, 44) I< (IVGg) is satisfied, and 
only then go over to the next iteration. This step-length rule, 
however, does not guarantee the global convergence of the method 
(5.13). We shall give other step-length rules for the choice of 
yes guaranteeing global convergence and preserving the quadratic 
rate of convergence near the point xy: 

Let. moO ef <i (0 mane ada rr) ); A € (0,1) be given and 


; n 
ate Ee be chosen. 














let a specific norm 


(160) 


take a where 


k > 
satisfying the inequality 


RULE A Sac 


UNG 2opedl pos 


RULE 2. Find an integer i > 0, 


the condition 


One 
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i > 0 .is the! smallest integer 


(1-7) []VCx,) {I (5.14) 


a Sean Ue. eles Choose from 


Me 


= Arg min Si xp tele, 


O<j<i 


RULE 3. Choose Oye from the condition 
: at 

Opn =p arg an I]Vex, +A P,) || , 
where i assumes all possible integer values. 
RULE 4. Verify the inequality (5.14) Ee OTe lee) eames Le sfeiame (ee eee 9) ees 
satisfied, put a 1; otherwise continue seeking according to 
Rule 3. 
RULE 5. Find Oh. from the condition 

a, = arg ae [|Vcx, + op, || 


In step-length Rules 1 
multiplying the initial step a 


Inequality Go.14 as iSatist ied. 


. : ee : n 
for any a priori specified norm in R ) 


es monotonically (from one iteration to the next). 


possible in choosing the norm in 


ania 


the integer all is found by 


ievisuccessivelly bys Pewwtntisl the 


It is interesting to note that 


the quantity V decreas-— 


hi Samnakesent 


n 


R to take into account the 


specific nature of the system (5.1). 


THEOREM 2.5.5 (0. P. Burdakov La 


be differentiable in R® 


and let the derivative 


[3]). Let the mapping VigR oo R 


V,0X) satisfy 
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the inequalities 


[VE (x) —Vi(y) |< llx—yl Vx, yeR, (5.15) 


I[VE(x)J-*|]<m VeeER*. (5.16) 


Then, for any values of the parameters ras? O} Nepean COR IO) 
y 
e « (0, min {1,r}), the modified Newton method (5.13) using step- 


length Rules 1-5 converges to the unique solution x of the 


4 
system (5.1) from any initial point XQ € Ris the rate of conver- 
gence being quadratic near the solution. 

Proof. We note that by Hadamard's theorem [1] the inequality 
(5.16) implies the existence and uniqueness of a solution of the 
system (5.1). We omit the trivial case where V(x,) = 0. 


First we consider Rule 1 for r= 1. Assuming that 


a ~€ [0,1], we use the Newton-Leibniz formula and the inequality 


f 
(5.15): 
1 
[V (Xebope)1=|V (xa) +o | VE (t+ to-P2) Pa dt | = 
0 
1 
=|V (+a § VE (y+ foros) VE (x4)) pa dt—av (x,)|< 
0 
<(1—a)|V (x,) [| -+o72]] p,?- 
Noting (5.16), we obtain the relation 
IVa, + op, |] < (1 - a(- oem” |]VCx,) 11) [VCs II - 
(5.17) 


It is seen that the inequality (5.17) will surely be satisfied 
for all oa «< [0, min {1,0,51, where Oy = +; then 
am*|| VC x, ) || 


o > min {1,o,}. It is not hard to see that the sequence 


increases monotonically. Therefore, 


eK 
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k+1 
q | 


I|VCx IVC x_) || : (5.18) 


meer 


where q = 1 - emin {1, da} < 1. Whence we conclude that 
| VCx,) |] > O as k-+e, Therefore, the sequence x, converges 
to the solution x, from any initial point. The sequence Ce oe 
therefore, beginning from some iteration, all = ean da bry, 
Theorem 2.5.3 the rate of convergence is quadratic. For r > 1 
the inequality (5.14) is satisfied for all o« <« [0, min {1,A0,}]. 
The further considerations are similar to those given above. 

Now let r <€ (¢,1). The inequality (5.14) will be satisfied 


forall .@ < \{0, “man Livia, where 
yp = (1S) [Im |V (xy). 


Indeed, from (5.17) we obtain 


IV (top i<(1— 2h 8) ry ypc 


T 


< (1-2) IV (x) 1< 1 —ae)"]V (x) 


The further considerations are similar to the case r= 1. 
Next we consider Rules 2-5. Suppose we have x, and Py: 
If we compare the quantities IN Cell and Vex, obtained 


through one of Rules 2-5 and Rule 1, Kespectively, 1b as not 


hard to see that ||V(x Po VCx Therefore, the estimate 


+1)! K+) |: 


(5.18) remains valid, implying in turn the convergence from any 


palatal eee O anton Xo € R”. We note next that 
xy |] <a mil Vex) || Ways Re ae 


and 
WV |] < 2ilvecxy I Ix - xy] 
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for all x from some neighborhood GCKD) Applying these esti- 


mates to the inequality 


Vex lh < Ives +p Tl < an || V Cx.) |? 


holding near the solution x,, we obtain the quadratic rate of 
convergence for Rules 2-5. /// 

It should be emphasized that we have succeeded in preserving 
a high rate of convergence of the Newton method owing to the fact 
that step-length Rules 1-4, beginning from some iteration number 
yield Oe 1 Sand @ine Rule? 5 OH. > LA aight kk. 23 coh, 

The main results of this Theorem still hold for weaker assump- 
tions. Thus, for example, global convergence can be obtained if 


n 


one requires the continuity of V(x) alioy 18% rather than the 


Lipschitz inequality (5.15). One may BW Sxeih welch @at “Wine Seda 
ity (5.16). In that case we have the fotowing theorem. 

THEOREM 2.5.6 (O.P. Burdakov [3]). Suppose there exists an open 
convex domain D ¢ Rk” such that the mapping V: D> R is con= 
tinuously differentiable in D. Leta point x «€ D be such that 


the set 


E(x) = {x : ||V¥ex).||.< VCs). x ©Di 


is simply connected, compact and the Jacobian VC) is nonsin- 
gular in L(x). We assume that in Rules 1 - 5 all the testing 
points x, ot op, are chosen from D. Then, for any values of the 
parameters \r?,_0,< 4 € (04). © ee. (0, min {1,r}), the modified 
Newton method (5.13) converges to a unique Solution ins,i(x) OL 
the system (5.1) for any Xp © L(x). The rate of convergence is 


superlinear, and if, in addition, the inequality (5.6) is satisfied 
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the rate of convergence is quadratic as well. 

We note that when the set L(x) is not Simply connected, its 
Simply connected components determine domains of attraction of the 
moda rica tion, Gok alc). é 

Among step-length Rules 1-5 it is impossible to separate 
the one which is more preferable in all cases. Each rule has its 
own advantages and disadvantages. In practice, the choice of a 
specific rule depends on how difficult the computation of the New- 


ton direction is, compared with the computation of the quan- 


Px 
tity V(x). If these computational difficulties are very differ- 
ent, Rules 3 - 5 are more applicable; if they differ Omiya Samer lintelayas 


Rules 1 and 2 are more applicable. 


5. METHODS OF THE QUASI-NEWTON TYPE 


The shortcomings (1.) and (3.) of the Newton method, as noted in 
Section 2.4, are absent in the quasi-Newton methods (they are some- 
times referred to as variable metric methods). These methods are 


based on the notion of sequential approximation, either of Vy or 


of luc For the approximation, only those values of the map- 
ping V are used which have been already computed on the preceding 


iterations. 


Lp * ; ; : 
ete ve is approximated, the iteration process has the form 


e ae ae Br V(x) (5.19) 


k 


: sel ys 2 
When the matrix ae 1S approximated, the computation is done 
by the formula 


x ss Xx 


a Es eee (5.20) 
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The (nxXn)-matrices By and Hy, are updated according to the 


formulas given below. We assume that Bo and Ho are a priori 
specified. 
If the mapping V is sufficiently smooth, for small variations 


AX, = X47 X& we have 


ee A 
VOX, 44) = V(x, ) + Vee at? AX, 


It is appropriate to require that the matrix Bad should satisfy 


the equality 


V¢ V(x.) + Bea Ax ; 


X44) qe ok 


which indicates the proximity along the direction AX, of the 


T u 
Manuteex Baad to Vi X41): We denote AV, = Vix) - V(x, ) and 


rewrite this relation as - 


Bay AX, = AV, : (i PAE) 


A similar approach is possible for an approximation of the 


T ea 
matrix [V (x44) ] . Assume that Ax, is small; then 


1 AV = Ax 


El 
LV (x k k 


«+1! 
We require that the matrix Aad satisfy the equality 


= Ax (9.22) 


eet ic ee 


indicating a sufficient proximity along the vector AV, of the 


: Bee 1 -1 
matrix Aad to the matrix [V(4,44)1 


The relations (5.21) and (5.22) are known as quasi-Newton con- 
ditions. They do not determine uniquely the matrices Bead and 


H Further hypotheses with regard to these approximations are 


k+1° 
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needed. Techniques for the computation of the matrix Bead from 


By Gas, acuualilty, abies meat ix Aad from H,) are based on quite 


reasonable requirements. Namely, given BL it is necessary to 

CONStLUGt ea, mat ra x Bad suth that, first, the quasi-Newton rela- 
tion (5.21) is satisfied and, second, the approximation of B does 
not change in some (n-1)-dimensional subspace te R” not con- 


taining AX,» 1. eh 


Bey? = BP De et 2 (5.23) 


Therefore, we have a system of n? equations (S221)) (C5. 23)" wa th 
espe CilmnLO a unknown elements of the matrix Bad: The unique 
solution is T 

(Av). - BAX, Jey 


k+1 yen (Bx (5.24) 


Kk? “Ky 
where Ce ae is a unique (up to the multiplier) vector orthogon- 
ato TT Each concrete technique for choosing the subspace Ts 


and therefore the vector ec gives a definite quasi-Newton me- 


ie 


NO Ce ELAS pelt meerT one takes an orthogonal complement to 


k 
AX,» i.e., if one requires that the approximation of B does not 


change along any vector orthogonal to Ax, , one obtains Broyden's 


first rank-1 method. We note that in (5.19) one can Pet ed sor 


the inversion of the matrix BL and update the matrix Boe in- 


stead of Bi To do this, we apply the Sherman-Morrison formula 


(see Appendix II) yielding 


(Axp—Bz'AV,) ch BR 


Bgl, = By! 
i aa <Bi* AVp, CED (5.25) 


Now we concentrate on the updating formulas for H The 


kK 
must satisfy the quasi-Newton relation (5.22) coin- 


matrix Aad 
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CAC Uno Meh SmCe sem Wilt imet hn Gmina rule Hy, in the (n-1)-dimensional 
Subspace TES R not containing, AV.» aeete lens 
Aye? H.P ee aie eae 


7 


which defines uniquely the updating formula 


— Hp AV) dh 


A 
Hy =H,+! Ae d> , (5.26) 


in which the vector dy. e R oPis orthogonal to the subspace TH: 


Specifying the form of the subspace Th. (therefore, of the vector 


dy as well), one can obtain a concrete Newton method. If, for 
example, as tT, one takes the orthogonal complement to AV, then 
qd). = AV, and one obtains Broyden's second rank-1 method. 

There is a certain relationship between the formulas (5.24) 
andm€o. 26). The if the matrices B, and Hi, are nonsingular A 
and Hy. = Bae then for Oe Bid, by (5.25), we obtain 
Ear rae 


The formulas (5.24) and (5.26) define a whole class of quasi- 
Newton methods. We cite here the best known: 


Broyden's first method (cy, = Ax, ): 


(Axp—Hp AVp) AxkHe . 


Moweesdt 
k+1 ah CAV, HE Axg> (5523) 
Broyden's second method (dy = AV, ): 
iT: 
a (Axp—Hpz AVp) AVE . 
Ayw=A,yt+ <AVa, AV,> 9 (5.28) 
Pearson's method (cy = AV,): 
i 
Hea kee AVe He . (5.29) 


«Ve, Hp AVE> : 
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McCormick's method Cd, = Ax, ): 
Axp— Hp AVp) Axh 2 
Ho ee Hyer tee (Coam30)) 


<AVz, Axg> : 
es 
the symmetric rank-one method Cd) = Ax, - H,AV,.): 


T 
oN (Ax,—Hpz AV») (Axp — Hz AV;) 5.31 
Ags = yt <AV,, Axg—Hy AVE> : ceas)) 


In Thomas's method the computations are made by the formulas 5220) 


andes (onzG)- i ne which 

















d,dh 
lax, ll | 
where dy. = Riot 5 apts . is a Euclidean norm and 
Wo qh 


The matrix ee is symmetric for some equations of the system 
(5.1). Hence it is useful to consider quasi-Newton methods which 
accounts tor this characteristic. For these methods the quasi- 
Newton relation (5.21) is satisfied, and if the matrix B. is 


symmetric, SO aS the matic Bead We cite here the arguments 
used first by M. Powell [2] in obtaining a Symmetric version of 
Broyden's first method and later by J. Dennis [1] in deriving 
formulas of a more general kind. 
Let the matrix Bi. be symmetric. Then the approximation 
3 (AV, - B,Ax, Jey 
B = B ees 


eye a (AX, c,) 


obtained by the formula (5.24) is not symmetric although it does 
satisfy the relation (5.21). The symmetry of this approximation 


can be obtained letting 
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= —T 
, Bs Neon orl 
k,2 2 
and, since the matrix By. 2° does not satisfy the relation (5.21), 


we Shall continue the process described. In this case, we obtain 


a sequence of matrices {B rs such= tuhnat 
Keen i=0 


= 2 (AV, —Bp, 21 Axp) ch 
By, a+1 = By, af ale 


A ane 
Rp Pre i=0, Ih 2, sleets 
B Bp, 2i+1 +2Br, 2f41 
katte eo soe 8 
where B =B. It turns out that {B, .}- has the limit 
k,O k K, i is0 
B B, + AVe— Be Axe) ch +-cp (AV_p—Bp AXp) 
Pees <Cr, Axp> ad 
__ <AVa— Bp Are, AXE? | be bP5 30 
Kee, Axg>? aa 
, 
It is not hard to see that the matrix B is symmetric and sa- 


k+1 
tisfies the quasi-Newton relation (5.21). lngches relation (oo) 


we can get rid of the necessity of inverting the matrix Bh 


To do this, we apply the Sherman-Morrison formula to (5.33) and 


obtain 


Bzh, = By?-+-[<c,, Be? AV,) (Bete, (Ax,—Ba? AV,)7 + 
+ (Ax,—Bz! AV,)cf Bg!) —<AV,, Ax,—Bg) AV,>Bz'c,ch Bat + 

+ <c,, Bric,> (Ax, —Bz AV,) (Ax, — Bz? AV,)7] x (5.34) 
x [<c,, Be? AV,>?+4-<c,, Bg? AV,><AV,, Ax,— Bz} AV,>]~?. 


Suppose we have a symmetric approximation Hy, of the matrix 
a The sequence of the matrices {H ib obtained as 


ae o 
[V.(4)] x,i°i=0 


a result of alternation of a quasi-Newton updating by the formula 


(5.26) and symmetrization, converges to the matrix 
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Axp—H, AV,) dh +-dp (Axp—He AV;)* 
Hei Hes k—p AVa) dk +p (Axp—Hy AVg) 


<dg, AVe> (3.35) 
__SAte— He AVe, AVED oy or 
. <dp, AVp>? oe 
It is seen that the matrix Aad is symmetric and satisfies the 


quasi-Newton relation (5.22). We note that there is no such ob- 
vious connection between the formulas (5.33) and (5.35) as the one 
poe 
between (5.24) and (5.26) for Cues B.d,. 
The relations (5.33) and (5.35) determine two intersecting 
(but not coinciding) classes of quasi-Newton methods. The follow- 


ing concrete methods correspond to the concrete values of the 


vectors Cy and dy: 
the Davidon-Fletcher—Powell (DFP) method (cy, = AV,.): 
Fags oh AxpAxk Hy AVp AVE Hg. 
BUTS TE CRE PAV DY CAV OV (5.36) 


the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method Cd) = Ax, ): 


Ax,p— FH, AV Axh Axp (Axp—H, AV;)? 
Hyai =H, +! k— Hp AV) Atk Arp (Axg—Hy AVE) __ 


<Axp, AVg> 
: Gorais) 
— SAte— He Vin AVE? KY A pT 
<Axp, AV,_>? k ky 
Powell symmetric Broyden (PSB) method (ey = X,)3 
B =F ae (AV,— By Axp) Ach +Axp (AV, —Bpz xy)? 
oul <Axp, Axp> 
SAV Be Ate BED Ay A yy Coed 
<Axp, Ax,>? k 5 
The last method can be described by an expression for oe by the 
formula (5.34), in which one needs to take Cs oe (we omit it 


here because it is too cumbersome). 
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We proceed to describe properties of quasi-Newton methods. 
D. Gay in [1] proves the convergence of the methods (5.24) and 
GSpc Ge tomamrootulor the. system of linear equations for any initial 
point Xo € R” in a number of steps nédt exceedings 2n. In [1], 
D. Gay also proves the local 2n-step quadratic convergence rate 
of the Broyden first and second methods for a nonlinear case. The 
local superlinear convergence to a solution of the system of non- 
linear equations (5.1) has been proved for most of the methods 
cited. The next theorem has a great potential for the investiga-— 
tion of rates of convergence. 
THEOREM 2.5.7 (J. Dennis and J. Moré [1]). Let the mapping 
Voehee oR be continuously differentiable on an open convex set 
D. Let there exist a point’ x, ¢d, such that V(x,) = 0 and lot 
the matrix V(x) be nonsingular. Assume that for some Xo € iD) 
the sequence X. constructed in accord with (5.19), is contained 
ane) x 7 x, torjadiaek = Oj,and*converges*to x,. In this 


case the sequence x, converges to x, superlinearly iff 


I [Ba—Ve (xe) ] (*eo1— x2) | 
lim IL Rk x \Xe +1 =). 
k > 0 | <p41—~*al (5.39) 
Proof. Let the relation (5.39) be satisfied. We consider 
[B,—V? (xs)i (%,41—%_) = —V(x,)—Vi (Xs) (4,41 —*,) = Eee 


= V (%p41)—V (%_) —VE (Xe) (Xp41—%4) —V (Xp): 


From the continuity of the Jacobian ve 1 xy and the 


newatid on Gonos) alt toOklLows. that 


5 IV (<p41) | =O 
ae ll ¥e+1—*e ll z (5.41) 


and since V(x,) = 0 and the matrix VC x4) is nonsingular, 
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there exists 8 > O such that 


LV (Xp4s) I =]V (X p41) —V (xe) | SB Xn41—%el- 








Therefore, ‘ 
IV (eos) B [lXe41—*ell —p—Pk 
I*ner—*ell ~ DXe41—*, |4+1 2—xel 1+ pp’ 
where es Pees el — x". Thus, from (5.99) we obtain 
the convergence rom > O as k-+ oe, and therefore the conver- 
k 


gence p, > O, which was to be proved. 


We assume now that the sequence piles A, at a superlinear 


rate. Then 


lim I *p+1—*e | =1, 





0.42 
ban [Fl eee 
Since 

| Xp+i—Xe]  Ixe—xell a I rsi—*l 

V*n—*el = Xe—*e) lxr—*ell 
Applying (5.42) to the relation 

HV (Xe+s)f _ PV (eesr)—Vi(x.) I xe—*el 

Iroi— el Ixp—*e] IXp41—X~e 
and using the continuity of Ve in X,, we obtain (5.41). The 


latter implies (5.39) if we take into account (5.40). /// 
As follows from the Theorem, for the superlinear convergence 


: : dh : 
x, * Xx jit is not at all necessary that B, = VC Xy) > A suffi- 


cient condition is the convergence of the sequence B. © V(X) 


only along the displacement AX, , which is characteristic of 


quasi-Newton methods. 


The matrix BL may, generally speaking, never become equal 
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T 
to Vi xy) even in the linear case (V2 (x) = A), where AV = AAx. 


Thus, after the first iteration we have B,AX%o = AV» and after 


the next iteration Bo Ax, = AV, and not at all necessarily 


Bo AXo = AVo: In other words, on each iteration the current infor- 


mation is embedded in the matrix however, in this case the 


Pri 
old information is lost in part. This creates necessity for a 


method which, along with the equality (5.21) would preserve the 


relation 
BaAlx.  P=Gr AVS: Ke nile Le ke lie OF 
Keel: — — 
ice SAclot ye tOreadle= kes then rewa ton 
Bi oS = AV; ‘ kisatieiy< ib< ks 6? 0.) Gora) 
For methods of the form (5.20) this means that 7 
Hy 4 AVa = AX, 3 Keni Rei kerk: iiseOnn) (5944) 


This is a property of the sequential (n+1)-point secant method in 


- -1 
which B [AV], [Ax]; (@ie Ay 44= [Ax], [AV], ), where the 


ie 


(nxn)-matrices 


[Ax],= [Ax,, Ae 9 ey Bx een: 
[AV], = [AV,, Views, eeey AVE atk: 
In the linear case le) = A). ) as lcenote hard gros see 
Bl = A (or H? = eae In the nonlinear case, if the matrices 
n 


[Ax], and [AV], are nonsingular, the point Xe4y will be -the 
unique root of the unique linear mapping having the values 


n ; n ; ; 
Oey at the points Re sara, : ha seetac tmp Onn tS Out 


that the sequential (n+1)-point secant method is a generalization 
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of the standard (one-dimensional) secant method to the case of n 
variables. 
To avoid in the secant method the inversion of the matrix 


[Ax] (or [AV],), Lites possible to represent this matrix in 


k 
quasi-Newton form. To do this, in the formula (5.25) the vector 


C,. must be chosen such that 


(Cy Ax, ) =" Os”, ca ie lic tee, 1 Or; 
(Cy, Ax, \ 7 BOe As C545) 


This choice of Cy guarantees that the relation (5.43) is satis— 
fied. Similarly, another representation of the secant method in 


thew torma(D 126) aismaellaited awash 
(dy AV, \ =- 0O., ak i es ads ky LS Oey 
(dis AV... 7 O, (246) 


which guarantees (5.44). 

If the Jacobian VX) of the system (5.1) is symmetric, a 
symmetric version of the secant method (see [1]) reflects this 
specific feature most fully. In this case computations are made 
throughe (S59 re Cons4) Cosa»). 

THEOREM 2.5.8 (L. Bittner [1], L. Tornheim [1]). Let the mapping 
V satisfy the conditions of Theorem 2.5.7. We assume that 
IX - Xx || is sufficiently small and for appropriate o> 0, r>0 


the iterations of the sequential (n+1)-point secant method 


eS ee [ax], [AV], — V(x) 
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are such that I] [A21,, || <r, and the matrices [Ax], are uniform- 


ly nonsingular, i.e. 


2 


Axr Axg-r AXp-n+1 
| det (er Taxpel, petty) | So >0. (5.47) 


"? YAxy—n4rl 


7 


Phene ke XS ato arsuperlinear rate. If, ih addition, the Lip- 


k 
schitz inequality (5.6) is satisfied, the order of convergence is 
not less than the unique positive root of the equation 


Sear et es oy, 


ic 
Proor. We omit the proof; it can be found, for example, in Ortega 
gncGweRbeanbolvdit, aaa. 

Together with the high rate of convergence and the need to 
compute on each iteration only one value of the mapping V, the 
sequential (n+1)-point secant method has the essential disadvan- 
tage that the matrices [Ax], be required to be uniformly nonsin— 


gular. The sequence which does not satisfy the inequality 


75 
(5.47), can become the source of instability of the method. Re- 
search of the last several years in this area has been oriented 


mainly to construction of versions of the secant method stable 


n 


with respect to the linear dependence of the vectors {Ax, Aa — 
ear 


In this connection we ought to mention works of B. Gragg and G. 
Stewardslt)s Di Gay and kh. Schnabel fb], J. Martinez [1], and 
Of Pee Burdakoy aipl ir 

The minimization of the function f(x) manifests some new 


properties of methods of quasi-Newton type, of the form 


ee ee ee oy Hy fy) ; (5.48) 


Aboot Etna tte tbne at unGH TONE (<)picequadratiuc, thaw (1s) 
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f(x) = Abevm + (0 ea 


where the matrix A is symmetric and strictly positive definite, 


and the length of the step hye is chosen from the condition for 
e 


exact minimum 


= arg ea f(x, - oH, f(x, )) . 


Ot 


some quasi-Newton methods -- such as, for example, the Davidon- 
Fletcher-Powell method (5.36), the Broyden-Fletcher-Goldfarb- 
Shanno method (5.37), the symmetric method of the first rank (5.31) 


-- generate a sequence of conjugate directions. In this case 
(Ax, , Aan = 0O Vida Fasie- se 


and, aS a consequence, there is convergence to the point of the 


minimum of a quadratic function in a number of steps not exceeding 
ial If the method completes n_ steps, then A = Ac In the 
case of minimization of a non-quadratic function this property 


leads us to the n-step quadratic rate of convergence of the form 


| =pigete i lieicbeclinpecsad teat (5.49) 


pact 
| SS a6 
ly independent, one can prove a higher rate of convergence than 


a teests Os ere ya weeks > n the directions: {2 are uniformly linear- 


(5.49), of the order exceeding the unique positive root of the 

F + . : : 
equation a a teas tes 0: This estimation of the convergence 
rate of quasi-Newton methods generating conjugate directions coin- 


cides with the estimation described in Theorem 2.5.8 for the se- 


quential (n+1)-point secant method. 
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5. NUMERICAL METHODS FOR FINDING A MINIMAX 


Several factors contributed to the appearance of numerical methods 
for finding the minimax; first, many practical problems of making 
decisions in conflict situations reducé to the problem of finding 
the minimax, and numerical methods of solving these problems are 
needed in a wide class of applications. Second, these methods are 
useful in solving problems of nonlinear programming and optimal 
control, which we will discuss here and again later on. 

To begin, we discuss the simplest case where a local minimax 
is being sought (see. Definition 1.5.2), and then we discuss in 


brief methods of finding global solutions. 


1. METHODS FOR FINDING A LOCAL MINIMAX 


We assume that in the problem (1.5.2) the function F(x,y) is de- 
fined and differentiable on se Be We seek the limit (as t > ~) 


points of the solution of the following Cauchy problem: 


dx aes dy _ 
at a Se ’ at Fa , 
C6zal5) 
x(0) oa 0 , y(0) = Yo 
The discrete version of this system has the form 
etiagn ably anc x Pk? Tes ng 
(@Ge.23) 
Viegy ~ VE , OF (x Y,) 


Here O < e« << 1 is a Small parameter, and the integration step 
is a> 0. We shall show that the following theorem holds. 


THEOREM 2.6.1 (N.I. Grachev, Yu.G. Evtushenko [4]). Let the 
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function F(x,y) be twice continuously differentiable in the 
neighborhood of the point 2, = [X,,Yel, where sufficient condi- 
tions for the local minimax, given in Theorem 1.5.7, are satis-— 
fied. Then there exist ¢ > 0, a > 0 such that for any fixed 
Ore ee Sand? Ores gost 9 the solutions x(e,X9,¥q,t), 
yY(€,X9,Vo,t) of the system (6.1) and the iterations KX, (CE,X9,VQ), 
¥,6€,%9,Vq) in the scheme (6.2) converge locally to the point § z,. 
The Theorem provides relatively simple methods for solving 
the problem (1.5.2). The presence of slowly- and rapidly-varying 
variables in (6.1) and (6.2) makes the computations complicated; 
however, in some problems involving large dimensions, such an 
approach is quite effective. We change in (6.1) the independent 


variable t= et and obtain 


ee eT ot Oe ee 
ae EoCX2V cis € G7 Fe ay : (6.3) 


Systems of the form (6.3) have been studied in the theory of 
Singular perturbations of ordinary differential equations: A. N. 
Tikhonov [1], A.A. Dorodnitsyn [1], EE. F..Mishehenko. and L-S. 
Pontryagin [1]. A detailed bibliography is given in A.B. 
Vasil'eva and V.F. Butuzov [1], and in E.F. Mishchenko and N.kh. 
Rozov [1]. 

Following the lines of the methods of singular perturbations, 
one can consider the so-called degenerate system obtained from 
(Gc), wee -¢ S Oe 


Cx re 
dat as “F(x, y) ’ oe er O . (6.4) 


If the conditions of Theorem 1.5.8 are satisfied and the 


function F is sufficiently smooth, the second equation defines 
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Thesunvque funetaen y= (x); with 


(x) =a Aree Mase Cay )n es (aN, BN), 
m 
y cE 


Substituting this expression: into F(x,y), we obtain the follow- 
7 


ing maximum function: 


OCR) = CZ ea). (6..G) 


Substituting = ty =-¢(x)) into athe first equation of (6.4), we .ob- 
tain the system 


ao = - SP = Px, ele), (6.7) 


which coincides with the Cauchy method (see (2.3)), which has been 
applied to the minimization of the function $¢(x). According to 
the results of Section 2 for the: local convergence of the method 


2 
(Ge ibaa SeSULtuctent thatthe smart mlx 


ChGx quae Feo ang (x)= Figg (C8) FL CX, 600) Fg (28, 6(8)) (6.8) 


be positive definite at the point x = xX,. 

In singular perturbation theory it has been proved that if ec 
is sufficiently small and certain assumptions are satisfied, a 
solution of the system (6.4) approximates the solution (6.3). We 
go backward: instead of solving the degenerate system (6.4), 
which may be too complicated for large values of m, we integrate 
the system (6.1) equivalent to the singularly perturbed system 
(Gio 3) 

Now we turn to proving the Theorem. We write the first var- 


jational equation for the system (6.1): 


Og BP MCe,S Sy) OZ ; 


(180) 2. CONVERGENCE THEOREMS AND APPLICATIONS 


here 


— FP x5 (Za) | —8P xy (24) 
Me, 2)=[TEStey eet |) = Le we 


5z=[6x, byJEE"*™, bx=x(t)—x,, Sy=y(t)—ye. 


The continuous, bounded (as @ function of ¢) matrix M(e,2,) 
satisfies the following condition: the roots of the characteris-— 
tic equation 


IMCe,z,) - Al = 0 (6.9) 


aaa 


as’ ‘>> 0 ‘split into two groups: 
the first group of m roots are close to those of the equa- 
tion 


[Fy (2x) St Ee OPS: 


the second group of n "small" roots of order eu, where y 


are close to those of the equation 


Multiplying the lower row-matrix on the left by put tet plc 
and subtracting this matrix from the upper matrix, we obtain that 


UmetS= thes nootron the following equation: 


|D (z.) + pl, |=0, 
D (2) = Fee (2) —F yy (2) Fog (2) Fyx (2): (6.10) 


By the sufficient condition of the local minimax the matrix 
o(z,) is positive definite, hence all the roots yy are real and 
strictly negative. Therefore, for sufficiently small values 
all the roots of Equation (6.9) have strictly negative real parts. 
By the theorem on stability of the first-order approximation, the 


stationary point Zz, 1S an asymptotically stable equilibrium for 
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system (6.1). This implies the local convergence of the method 
(6.1) to 2a, as well as the convergence of the difference version 
of (6.2) for sufficiently small values of a (see Section 3). /// 

The methods (6.1) and .(6.2) can be used to find local maxi- 
mins in the problem (1.5.1). In this dase, however, one should 
ake cao > =, sien, e is regarded as a large parameter. When 
the sufficient conditions for the local maximin given in Theorem 
f-529 (are satistied, the solutions of (6.2) and (6.2) converge 
locally to the local maximin ii “¢ sis\sutficientily large and! Jo 
is sufficiently small. A new small parameter ec« in (6.1) leads 
us to the situation in which the variables x change considerably 
at a slower rate than the variables y do. This makes the inte— 
gration of (6.1) more complicated. In the case where HG) Ss 
a strictly convex-concave function, in the problem (1.5.1) there » 
is a saddle point and we can put e« = 1, simplifying thereby the 
calculations; the method (6.1) becomes thus the methods G2..o ror 
finding saddle points. The method (6.1) can be used, because it 
is so simple, to solve elementary game problems. (We shall dwell 
upon this issue in more detail later on, in Section 8 of Chapter 
6). A theorem on the global convergence One GCOnele manic GGna mas 
been proved under the assumption that somewhat stronger require- 
ments than the conditions stated in Theorem 1.5.8 are satisfied 
(see Yu.G. Evtushenko [12]). 

A "rapid" motion along y can be made, using the Newton me- 
thod: 


ee Sapa Peo i 
xo = ~el (x,y) ’ y = Foy na eexay? 


It is not hard to show that eigenvalues of the matrix in the varia- 
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tion equation consist of n numbers close to the roots of C6710) 
plus m numbers close to -1. This implies the local convergence 


of the method under usual assumptions. 


2. REDUCTION TO THE PROBLEMeOF FINDING SADDLE POINTS 


In the problem (1.5.2) we replace the vector y by a new m-dimen- 


Sional vector p: 


1 ee 1) 1 0 (Ge11) 


where g(x) is defined by (6.5). 
We consider next a new problem of finding an unconstrained 
minimax: 


min max B (x, p), Bix, p)=F (x, p+g(x)). 
xEE" peEm (G.5129 


A remarkable property of the transformation made above is that 
from the existence of the synthesis of the problem GLO 2) eras tol 
lows that the problem (6.12) has a saddle point. Indeed, Theorem 
(1.5.1) implies the inequality 


max min B(x, p)<min max B (x, p). 
pEeE™ xeEn x€E" peEm 


On the other hand, 


max min B in B(x, 0) = 
pEeE™xEEn (*, p)> yann is ) 


an max F (x, y) =min max B(x, p). 
xEEtyEEn x€E" pee 


Comparing this inequality with the preceding inequality, we obtain 


max min B(x, p) = min max «B(x, py 
peE™ xeEn x€EX pee 


i.e., the problem (6.12) has a saddle point. 


Differentiating B(x,p) over x and p, we obtain the fol- 
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lowing formulas: 





6B dg dB 2B dg 
inte ae ly ‘Op =F peor Pay t ae iw 
03B dg a2B y See 0?B 
Sct hae oe (Fut Spaz) + a Fy, Op? wu" 


Here, for the sake of brevity, we omit ne arguments of the func- 
tions. Letting p, = y,-g(x,), ‘we then obtain 
BAXes 0) = F(z.) =0, TB (Yes 0)=F, (24) =0 


By (er 0) =Fyy (20) <0, Bp (%», 0) =0, 
Bye (%e, 0) = (24) > 0. 


These formulas show that the transformation (6.11) has the follow- 
ing property: the stationary points [x,y] of the function 
RCUxey ee iturnilinto. thea~stationary points [x,p]. of the function 
BOx,p); (if at the point [x ,,y,}- sufficient conditions for a 
strict local minimax are satisfied, the point [x,,y,] becomes - 
the?point~ [x,;py 1), “where the sufficient conditions for a strict 
local saddle of the function BCx,p) “are Satisfied, and 
BCX ya = B(x,,0), which allows us to use the well-known itera- 
tive numerical methods for finding saddle points in order to find 
the local minimax and maximin. We shall elucidate our point using 
two Simple methods by way of example. 

To find saddle points of the function B(x,p), we use the 


gradient method (2.5). We compose a system of n+m ordinary 


differential equations 


dx 


a Coe 
aes -B,(x,pP) ; a B,(*,P) 


Next, in this system we go from the variables x, p_ to the 


initial variables x, y. By doing this we transform the method 
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for finding a saddle point into a method for finding a local mini- 
max. We obtain 


a 
y= = - y = + : Gr 
2K ee BF, ; y Fy gx @GEe3)) 


Now we use another methdéd. For the variable p we use the 


Newton method, and for x, the gradient method: 
eo EG) Po Be Cup) Beep) 
x pp p 


In terms of the initial variables the method has the form 


-1 [ 
Kies Si gor Eee Fog, eas ¢ Pi Bag hee. 6.14 
x 8%, ¥ a ( ) 


The practical application of the methods obtained becomes difficult 


because the expression 


om -1 
= -F X,g(x))F Deux Grekeo 
g,, yy Xe BOO Fy (%, BCX) (6.15) 
contains the function g(x), which can be computed only by solv- 


ing a minimization problem. However, simplified versions of the 


methods obtained are feasible: in (6.13) and (6.14) we can put 


Taye af 
eee Pyy (8) VF Oey) . (6.16) 


In what follows we shall prove the convergence of these methods 
and, furthermore, justify the following methods for finding a lo- 


cal minimax: 


ee er T 

x = Le ; y = a re Sx , CG eZ) 

x = nee = 85.7 ’ y = By ’ (Ox 18) 

S Sei oF ee ass (6.19) 
x xe 8 es yyy’ 


This is a discrete version of the method (6.17) exclusive: 
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Xp¢1=Xyp— OP (Xp, Ye)s  Yeer =YebO [Fy (Xe Ye) + 
PG (Xr Ye) Pye Xe» Ye) Px (Mer Yo) ]- (6.20) 
According to the results of Séction 1.5, the stationarity 
conditions (1.5.17) are necessary conditions for a local minimax 
inthe problem) (2.0.2). Applying. the Newton method to solving the 
system (1.5.17), we obtain the following method for finding a 


local mimimas: 


Xpa1=Xp—D~* (Xp, Yn) [Pe Xp» Ye) — 
—F oy Xp» Yn) Fog (Xn Ye) Fy (Xns Yed], 
Ynt1=IYn—Fig (Kes Yn) ee (Xn> Yr) — (6.21) 
me Fg (Xn, Ye) (Xp4i—%,) |. 


3, s\PROOF OF GONVERGENCE 


Assuming that in all of the methods cited above e is calculated 
by the simplified formula (6.16), we state and prove the FAneeIae 
theorem on convergence. 
THEOREM 2.6.2. Let the sufficient conditions for a local minimax 
be satisfied at the point z, = [X,,V,] (see Theorem 1.5.7). Then 
ei. solutions of theesystems (6.13), (6.14), (6.17) -— (6.19) 
converge locally exponentially as t >» to the point 2, ; 
ooo there exists a > O) such that for any 00-90 < a dis- 
crete versions of these systems of the form (6.20) converge local- 
ly linearly to the point 2Z,. 


Proof. We use Theorem 2.1.1 on stability of the first-order appro- 


ximation. Let 


bx (t)=x(t)—x., Sy (t)=y(4)—Ye, 
5z==[6x, dy]E E?*. 
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Dropping second-order terms, we obtain a variational equation 


for the system (6.13): 


6Z = B,6Z , CGE 2) 
e 
where B, denotes the square matrix of the dimension (n+m) 
representable through four block-matrices: 
Aa — (2,) e055 |. 
ge | were eenmenc cence Si ceneceentmnctnneenneenceren lO 
Fyx (2) + Fig (29) Fux (2_) D (24) | Fy (2¢) 


Elements of all the matrices are calculated at the point Z,. The 


characteristic equation of the matrix By has the form 
(Biehl |= (© (2,21 | (2.)— at | 0 


Therefore, the eigenvalues of the matrix By consist of the 
eigenvalues of the matrix Eye and -0(z,). Both matrices 
are Symmetric and negative definite. Therefore, all the eigenva- 
lues of the matrix By are real and strictly negative. Hence we 
conclude that the equilibrium point 2, for the system (6.13) is 
asymptotically stable. 

The convergence of the other methods can be proved in a simi- 
lar way. We denote by Bo, Ba, By; Be the matrices of the varia- 
tional equations (6.22) for the methods (6.14), (6.17)-(6.19), re- 


spectively. It is easy to show that the characteristic equations 


are the following: 





=O KJ ! 0 
Rea SS eee eae 
|B, nm |= ear ry ele FupFysO | ey) re 
—Fy, —Al, ! aon 
Daweh] oe eee E. sseveneebtavennnsasnsnenrn-tuornecntnasuennncsnsneentne =0, 
|Bs nem |= Fyx + FygF yxF cx! Fyy +P yyFyxF xy— Al m 
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lee ale cane 
[Be—Masml =| oR pe |= 
tee nt are 
Ore kia 0 
Be) ogee te Set 
| 5 | | — FyyFyx | —(1+4) Im 





=|—O—A, |-|— (1+) I, |=0. 


These relations imply that all roots of the characteristic 


equations for the matrices B B B are real and negative. We 


2A 2 eS) 
multiply the upper row of the block matrix Ba - AL am on the 
-1 
left by Foyt yx and add the product to the lower row, the deter- 


minant of the matrix being unchanged in this case: 


—Fy,—Al, | a 


Next, we multiply the right column of the matrix obtained on the 
-1 


abit asta iy re yy yx and subtract the product from the left column. 4 
Then 
_|—O—M,yt —Fry ie, 
| Bese) pate | ol eae One nee i |= 


=|—O—AI,|-|F,y—Aq|=0. 


From this we infer that the eigenvalues of the matrix B. are 
real, negative, and the method (6.17) converges exponentially to 
the stationary point Zy,. 

If a is sufficiently small, then by Theorem 2.3.7 the con- 
vergence of discrete version of the methods follows from the 
proofs given above of the convergence of continuous versions of 
the methods. /// 

We note that one can weaken the conditions of the Theorem in 


the case of the methods (6.14) and (6.19) and require that 
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Foy 24) be nondegenerate instead of Eye oe? being negative de- 
finite. This property is, however, not essential; hence we shall 
not mention it in the sequel. 

The convergence of the Newton method (6.21) follows from the 
general Theorem 2.5.3; according to the Remark we reformulate 
Theorem’ 2; 3. 3° \as- follows. 

THEOREM 2.6.3. Let the function F(z) be twice differentiable in 
the neighborhood of the Stationary point Z,, and let the matrices 
o(z,) and Foy 62x) be nondegenerate and the matrix Fg (2) sa- 
tisfy a Lipschitz condition in the neighborhood of Z2,.' Then? the 
iterations (6.21) converge to Z, locally, at a-quadratic rate. 

The methods described above can be used to obtain diverse 


MOdRileatronse. | Lor example, let “y"="e(x)) on the right-hand side 


of the first equation in (6.13). ‘Then we obtain 
: ; fk ; 
a Sy SPS, 1g). y = Ee) 7 2 xx (6.23) 


In the methods.(6)13)), (6.14), (6.17) - (6.19) the derivative 
of the function g(x) can be found from (6.15). When the condi- 
tions of Theorem 1.5.7 are Satisfied, the methods will still con- 
verge to Z,. This follows from the fact that for the variational 
equations the eigenvalues are the same as those in the case where 
the formula (6.16) is used. Numerical realization of such methods 
is, however, much more cumbersome since for calculating the right- 
hand sides of systems of differential equations an additional pro- 
blem of maximization of F(x,y) in y must be solved. 

The employment of the function AGS ME vid Cid Ne Pp C64 \indoes 


not improve the quality of the methods applied. In the method 


(189) 2.6. NUMERICAL METHODS FOR FINDING A MINIMAX 


(6.23), on the other hand, a property of global convergence devel- 


ops, as well as in the following method: 


Roe ooh Cx, et a) Vn CS a aey (6.24) 


THEOREM 2.6.4. Let F(x,y) and g(xy be everywhere continuously 
differentiable in all arguments. Let for any xX «© E" the function 
F(x,y)¢sbe strictly concave in *y»vand the function’ F(x, g(x)) 
be strictly convex in x. Then the methods (6.23), (6.24) con- 
verge as t+ © globally to the stationary point z, which is 


the unique global solution of the problem (1.5.2). 


Proof. We show that the stationary point 2 being the equili- 


* 
brium point for both systems, is globally asymptotically stable. 


We define the positive definite functions 


04 (x, y)=51x—*el EF ly—e PSO, 
vs (x, Y= yl x— Hel HF (Xe, Ye) —F (te, 9) 20, 


which are zero at the stationary point Z2=2 only. Obviously, 


* 
the function vai is infinitely large. The function Vo is 
strictly convex in ~x and) y, and, by Theorem 1.1.2, iis also 
infinitely large. 

Differentiating V4 and Vo> using the systems (6.23) and 


(6.24), respectively, and using the convexity and concavity condi- 


tions, we obtain 


dos F(x, g(x))s He —O+<Fy (ts Ys Y—- BEDS 
KF (%e, yo) —F (x, g(X)) FF (x, WF (%, 8 (0) <0, 


fe <F (x4, x) —F (8, B()) TF (tes YF (Hm BCH) <0, 
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V4 = Vo = 0) oye one 5 = x yo= ye = CC 4, ) ee Hence, by the 
Barbashin-Krassovskij method, the COUanal Diiaumep On) taza iy “EOT 
the systems (6.23) and (6.24) are globally asymptotically stable; 
and therefore solutions of these systems will only converge to By 
for any initial Cauchy daveb. 

To apply the last two methods it is necessary to determine 
values of the function g(x). Multiple computation of g(x) can 
be replaced by integration of a system of differential equations. 
Indeed; >for any yse= ¢(x) a necessary condition for the extremum 
BL, g(x)) = O must be satisfied. Differentiating this relation 
over xX, we obtain that y = g(x) is a solution of the following 


Cauchy problem: 


ge = ~FU(x, g(x)) Boal gC) oy 


g(x)" = Vyg2 - : (6.25) 


These methods have two advantages: first, they are relatively 
Simple and, second, they enable one to find a solution with a high 
accuracy. Their disadvantage, however, is the need to impose ri- 
gorous constraints on the function F(x,y). If these methods are 
applied to arbitrary functions, the results obtained via these 
methods ought to be regarded only as preliminary. In order to 
assert that the solutions found thereby are global, it is neces-— 
sary to study the problem more fully, or to make sequentaal ical 
culations via global methods. 

All of the methods given in this section can be used to find 
saddle points. Indeed, if at the point Z, Sufficient conditions 


for the strict saddle are satisfied: Foy $2) < O, Fug ( Ze) uO 
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then necessarily $(zZ,) > 0 at this’ point, and the conditions of 
Theorem 1.5.7 are a priori satisfied. 
Next we give two methods intended specifically for finding 


saddle points: 


x = -FL(x, g(x)) ? ry = F(dty), y ) ’ (6.26) 
KX SredGyos sex US; yi tai eg Gren ylos (6.27) 
Here “d(y) = Arg min, F(x,'y). 
xeEn 
Assume that for any x «© eee ye ea, RFK y Fy, we 


have the strict global saddle-point condition (1.5.16). To prove 


this, we construct the positive definite function 
WON) oe wre eC) aed) ay 


If the function F is strictly convex-concave, the functions 
2 
d(y) and g(x) are differentiable. Computing the derivative of 


the function v by the systems (6.26) and (6.27), we obtain 
respectively 


v=—|F,(x, g(x))F—lF, (dy), 9)? <9, 


G=XF, (x, g(x)), d(y)—*>+ 
+<F,(d(y), ¥), y—8 (*)><F (d(y), B(x))—F (x, g(x) + 


Therefore, the methods converge, and in the second method 


v(x(t), y(t)) < v(%», ygle~ 
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4, METHODS FOR FINDING LOCAL SOLUTIONS IN x 

AND GLOBAL SOLUTIONS IN’ y 
If in the problem (1.5.2) it becomes somehow possible to solve an 
interior problem for each x € hey the problem of finding the 
minimax reduces to the problem of seeking the unconstrained mini- 


mum of the following function of many variables: 


x, © Arg min sjoCajy 6a s)) 


n 
xek 


The function g(x) is determined either analytically or 
from the solution of the maximization problem, or from the integra- 
tion of the system (6.25). In the case where the dimension of the 
vector y is small, it is possible to construct a grid on the set 
of admissible values of the vector y and find the values of the 
function (x) from the array of the values F(x,y;) on this 
ulna 

Abstracting from solving an interior problem, we arrive at 
the problem (6.28), for the solution,or which onelhas to construct 


a sequence of points x converging to ,x,. First we consider 


k 
the case where the function “¢(x) is differentiable and its deri—- 
vatives can be determined by the formulas (6.7) and (6.8). We 
denote by My, M5, Mg, My the smallest eigenvalues of the follow- 
ing symmetric matrices, respectively: tn Fx 62x)» 


Ek aya Oe 


XX 


THEOREMEAR OW. let thie Conditions of Theorem Shui bea tiastaled 


at the point 25s [x,, YJ]. Then we have the relation 


(193) 2.6. NUMERICAL METHODS FOR FINDING A MINIMAX 


DiO Ort he ene Gas x Poy (2x) is negative definite, we have for 


m 
any nonzero vectors y « E 


my | yl? <y" Fy, (2) 9 <9, 
y F ap (ze) 9 <|y P/ms <0. 


Letting y = Pe we obtain 
x" Pex (Za) XD Mal x)t—y Fy} (za) YS (m, —=) |x|? 


m 
implying that m4 > My — 


est eigenvalue of the matrix Boy (oe yx is nonnegative and 


Taking into account that the small- 


that m 


i< 0, we arrive at the required inequality. /// 


For many numerical minimization methods the convexity of the 
objective function $(x) in the neighborhood of the point x, is 
of importance. It follows from the inequality obtained that fors 


my > 0 the function 6(x) is convex. This property can hold 


even in those cases where the initial function F(x,y,) is not 


convex in the neighborhood of the point x,. We shall use this 


property in Chapter 4 in solving nonlinear programming problems. 
To solve (6.28), we apply the simplest methods of unconstrain- 


ed minimization: 


ei. The gradient method with a constant step 


Se i Oe el 2 


O the method con- 


> 


If the function $¢(x) is differentiable, m4 


; M being. the 


> 
- - 2 
verges locally for any O0<a< a. Here) a= ul 


largest eigenvalue of the matrix $4 Gree 
8 Re es 
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e2. The method of steepest descent 
Piet A Pe Be ee PE euat 
where a. = Arg min a(x, = F(X), B(x, ))). Assuming that o,(%,) 


o e 
is small and dropping third order quantities in a, we obtain 


P(X, — OP (Xp) = as 
= (%_)— || Px (%p) [P+ > Pz (Xp) Pex (Xp) Px (Xp) 5 


minimizing the right-hand side in oa, we obtain 


a, = 102(%) 
a 


. a,= Px; (Xp) Pax (Xp) Px (Xp) 


This yields the estimate 


peers I Pe (xx) |I4 


P (Xp41)< P(X) Qa, ? 


ensuring relaxation of the minimization process for a minimax 
definite function; 


e3. The Newton method 


es Bice iS a ae 6,05) 


li =the matrix oe) satisfies a Lipschitz condition and the 
conditions of Theorem 1.5.7 are satisfied, the Newton method con- 
verges locally at a quadratic rate; 


e4. The generalized gradient method (see the formula (4.5)) 
ta ae aap ie 4. YH r HL S 9 (x, ) 


To minimize $(x), many other numerical methods of unconstrained 


minimization can be used. 
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In those cases when the interior problem has no unique solu- 
tion, the function (x) is no longer differentiable, the method 
of steepest descent and the Newton method are not applicable. Then 
one has to use either general methods of nonsmooth optimization, 
or develop special versions for the ints ere on of maximum func-— 


tions. The latter possibility has been discussed in V.F. Dem'yanov 


AHORVeNs avast ey | tj. | 2 i 


Chapter 3 


THE PENALTY FUNCTION METHOD 


The penalty function method is one of the best-known numerical 
methods of nonlinear programming. The idea of the method is sim- 
DLewanGmauate universal; which explains why this method is widely 
used in solving various extremal problems. Many variations on 
the method have been Suggested, and new ones still continue to 
pop up from time to time. A detailed bibliography of the non- 
Soviet literature can be found in A. Fiacco and G. McCormick ay 
The many modifications currently available notwithstanding, 
they still do not exhaust all the possible applications of this 
method; and more studies in this direction would be potentially 
fruitful, At the same time, it should be noted that practical 
computations for specific problems have revealed an essential dis= 
advantage of the method: it is unsuited for solving problems 
where high accuracy isi Seeded: The use of large values of the 
penalty coefficient leads to the minimization of ill-conditioned 
functions, which complicates the computations considerably. Many 
authors have justly criticized the method for this reason. One 
needs, however, to balance this with the strong points of the me- 


thod. First, the domain of convergence is frequently of an essen- 
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tially larger size than in other methods; second, the computa- 
tional schemes for instrumenting the method are characterized by 
their extreme simplicity. For these reasons the penalty function 
method is indispensable for finding initial, approximate solu- 
tions. However, if the computations must provide a higher degree 
of accuracy of the solution, it is more appropriate to employ 
other methods of rapid convergence (for instance, the methods de- 
scribed in Chapter 4), using as an initial approximation the 
results of the penalty function method. 

We start our presentation of the method with the more or less 
traditional version, and then go over to its modifications. As 
will be evident from our further discussion, the penalty function 
method can include also the cost-function parametrization method 


2? 
and its modifications, described in Section 3 of this chapter. 


1. THE EXTERIOR PENALTY FUNCTION METHOD 


1. THE GENERAL IDEA OF THE METHOD 


We consider the nonlinear programming problem (1.6.1). We say 
that the function S(x) | defined on po deca penalty function if 
the following three conditions are satisfied: 

ei. the function S(x) is everywhere continuous on E': 

oo. SCs) = 0 for any: x <5; 

oon SCX) 0 Or any. x fo. 


In the case where the "feasible set" X is defined by the 


condition (1.6.2), the penalty function is usually constructed of 


the form 
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S(x)= 20 (e'(x)) + 2 (A4 (2). ne 
Here the continuous function 9$(q) is such that $(0) = 0 and 


O(q) 0 for ral? *o tt ON Typical choices for o(*) ‘are the 


functions 
P(N=9, (=a, PQM=lql, e(g)=er—1. 


We say that an additive penalty function of the form lead aS 
separable. A simple example of a non-separable function is given 
by 


S? (x)= max ney [le (x) |, A4 (x)]. (1.2) 
be (iie] fefa: 


We introduce now an auxiliary function: 
PCxP Tre Tx) rete ae CES) 


where t is a positive parameter referred to as the penalty 
COCTRUGTOM i. 

The penalty function method is in essence the following. 
One chooses some monotonically increasing sequence My SOlg Sater py 
and solves the unconstrained minimization problem for the function 
P(x,T,) im x for)’ Kemth,3 95. sae. osOne them obtadas. a sequence 
of the points {x,} Satisfying the condition 


x, € Arg min P (%) ity). 


xeE 


(1.4) 


If Mia under certain conditions each convergent subsequence 
of the sequence {x} converges to a point of the set of solu- 


tions X, of the problem (1.6.1). If for some finite valueror Fr 


k 
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one obtains that <M X, 


the initial problem’ (1.6.1) has been 
solved since in this case the point xy, belongs necessarily to 
the set of solutions X,. Cases of this kind with special penalty 
functions used will be ee in Section 3.2. In the general 
case the sequence {x,,} Seo Nie AT Sin Gt ms ta ha cCatdlonmort 
the convergence method will follow from the proof, given in Sub- 
section 1.3, of the convergence of the first, simplified version 
of the penalty function method. 

The penalty function S(x) introduced is nonzero everywhere 
outside the feasible set. Hence this function is usually called 
an exterior penalty function (exterior penalty), and the reduc- 
tion of the problem (1.6.1) to the sequence of problems of the 
unconstrained minimization of auxiliary functions of the form 4 
(1.4) is eetea the exterior penalty function method or the exte- 
rior point method. Another type of penalties, called the interi- 
or penalties, will be described in Section 3.4 below. 


Here is a Simple example that illustrates the method de- 


scribed above. Let 


f(s) = xe g(x) = xo. e = i, ei 0), Ds vcleece G@leed)) 


The solution of this problem is the point x = 0. We construct 


the penalty function as 
= oo : 
Péxat)e 3) om Aor lee Slt a Ty 3 : 
from the necessary minimum condition 


Pe CX 12) = olin + 4x? = 0 
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we find the dependence x(t) = -(4T) This shows that 


Ce) = OF AS 7 os the method really leads to a solution of 


? 


the problem. 


2, COMPUTATIONAL ASPECTS 


In the numerical implementation, the computing time goes mostly 


to finding the points x Skelinakishiyakover Wlevey eorayelsicsieya (ail.2))), “Gey 


k 
solve the initial problem more accurately, one should increase 
the value of the penalty coefficient tT. However, the increase 
of «x leads to the situation that P(x,1) as a function of. = 
has the shape of a ridge since in the neighborhood of the boundary 
of the feasible set the function TS(x) changes abruptly from 
zero (on the feasible set) to large values outside the feasible 
set. Any numerical minimization procedure for these functions is 
extremely complicated; hence it is preferable to increase slowly 


the penalty coefficient tt: for some tT find the point 


k he? 


then, minimizing P(x,t take x= xX, as an aljoabiealeyil Fexosiiaar 


k+1)? 
The knowledge of a good initial approximation makes it easier to 
find the unconstrained minimum. 

In practice it is also important that all the functions spe- 
cifying the constraints be sufficiently well fitted.» The non-= 
linear programming problem (1.6.1) does not change character if 
in the conditions g(x) = 0, h(x) < 0 some of the constraints 
are multiplied by an arbitrary large positive number. In numeri- 
cal computations this operation means that these particular con- 


straints are accounted for, while the remaining constraints are 


strongly violated. Hence, while using this method one should 
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have the possibility of “scaling” the constraints by multiplica-— 
tion by suitable "weight" coefficients. In many cases, however, 
the increased "weight" of the constraints fails to reach the goal. 


For example, consider the problem where 


f(x) = x, g(x) = x2, e = 1, C= 07 need! 


The point x = 0 is the solution of this problem. We use the 


Simplest penalty function 


P(x,T) = x + tx? 
tie swmersve tOmsce thats tor anya tr 
inf P(x,tT) = -” 


xek 


Thus, in the auxiliary problem the minimum is unattainable 
for any penalty coefficients. In our example the cost function 
f(x) on the unfeasible set decreases as x > -~ considerably 
faster than the penalty function does. To make the method work, 

a more explicit penalty is needed. For example, one needs to let 
Px) = x + rig: tol) PO Pere > (CX uty) ie x? cts tx®. then the auxi- 
liary problems will have solutions for t > 0, and the method 
ensures that the problem is solved. 

As is evident from this example, to make the penalty function 
method work efficiently it may be necessary in some cases to make 
a preliminary analysis of the problem, and choose non-standard pen- 


alty functions. In the cases where the user (a skilled mathema- 


tician) has a program already available, it may be necessary to 
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modify it. However, it is simpler to change the technique of 
specifying the functions in the initial problem. Assume for in- 


stance that in our last example the feasible set is given by the 


2 e 
condition ¢Gx) = e“*"_1= 0 rather than by the condition 
g(x) = x? = 0. The feasible set does not change; and there is no 
need to make any changes in the program in use. At the same 


time, a more explicit dependence of g(x) than that given in 
the initial problem, allows one to be sure that the auxiliary 
problem (1.4) has a solution. 

If the functions defining the problem (1.6.1) are sufficient- 
ly smooth, it is then desirable that the function P possess the 
smoothness of the same degree as the former since in this case 
for their minimization one can use methods which have a high rate 


of convergence. On can assume, for example, that 


P(x, t)=f(x)+T | Ste! (x)]}?+ Sin (x)]4 | 


If the functions £, g, h are twice differentiable over x, the 
function P(x,t) is also twice differentiable over x. A draw- 
back of this function is that for large values of ler] and nd 
the nature of dependence of )P(x,1)” om constraints" of the equal— 
ity type is different from that of the inequality type; to re= 
move this difference one can construct a penalty function in a 


more artful way, letting, for example, 


S(s)= See! or Tow eo, (1.6) 
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Here 
O EL ves0 ; 
vty = Ky? | LfesG Oleyre rT ; 
y ets Koy + kK, valet Cay) ; 
me being suLticiently small s(usualily 1 = TO ky Ky, Kk, 


being defined from the continuity conditions for the first and 


second derivatives of the functions jy: 


k = ze k = = k = re 

a oe 2 7 Sie s 
For large values of |g’'|, in the dependence of S on |g |, 
ny is identical -- namely, quadratic. This function is widely 


used in numerical computations. 

The penalty function method makes it possible to replace ae 
initial problem of constrained minimization by that of solving a 
sequence of unconstrained minimization problems. This technique 
extends the domain for seeking the minimum in x, "removing" the 
constraints, which is quite convenient for finding local solu- 
tions. For finding global solutions, the method makes it neces-— 
sary to find global minima of (1.4) on E>, which is in general 
much more difficult than solving the problem (1.6.1). Indeed, 
if the set xX is bounded, the initial nonlinear programming 
problem offers an opportunity to construct coverings of the set 
X, using some auxiliary sets (see Chapter 7), while using the 
penalty function method one needs to seek global minima in the 


entire space ee which complicates the whole process of seeking 


the extremum. Thus, the method of exterior penalties is effec- 
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tive only in finding locat solutions, but has no potential appli— 
cation in finding global solutions. 

For the nonlinear programming problem (1.6.29) the feasible 
set is the intersection of the sets X and U. When the penalty 
function method is used to solve problems of this kind, the func- 
tion S(x) "penalizes" as before for the violation of the condi- 
it: 14@ 10 ee mG Cm aU laden i Va OO Ine ine me: Oneal na Gl) 


HOG) = (Are min TP Ck) + aot xy 
xeU 


AS usual ct tends toy co to antinity.. ihus., the condition 

x € U is accounted for through the substitution of a minimiza-— 
tion problem on the set U for the auxiliary problem of uncon- 
strained minimization. The same generalization can be made for 
other versions of the penalty function method listed in this 


chapter. 


3. THE FIRST SIMPLIFIED VERSION OF THE PENALTY FUNCTION METHOD 


Intuitively it is seen that in numerical computations using the 
penalty function, auxiliary problems of unconstrained minimiza- 
tion can be solved approximately. It is then advisablle to an— 
erease the accuracy of the computations aS tT grows, as the 
points xy approach the set X,. This makes the implementation 
of the method somewhat simpler. 
We consider now three nonnegative, continuous functions 

UCt); Ct), ttt) of the scaiar arsument “¢) < Life We write 


the auxiliary function as 
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ae Ue ee) Re) eae es Cone) 


Theltunci sons) Vi Ctoon, WCE youre Ct)! Borvany...ti2 Or and: 6. ».0 


satisfy the following conditions: 


z 


DSO, w>0, +()>0, timZh—o, 


lim =) — t(t-+6) ~ t(t) ee (1.8) 


too b p (#) ‘ pw (t+ 8) p(t)’ v (t+ 6) 








Thus, the ratios t(t)/u(t), utt)/vC(t) monotonically tend to 


infinity as t+», We define the set 


Q(t) ={x(t): P(x(é), t) <min P (x, th+v(t)}, 


which consists of the points furnishing the minimum of the auxi- 
iarywatunce ton ~DixX i jwlibesk owithsan errors .0Ct). (CwLlth respect 
to the value of the function). 

The first simplified version of the penalty function method 
is the following. A sequence {t,} is to be constructed whose 


k 


elements satisfy the conditions 


OME <i a < 506 < St ; lim t, = © 
k+0 


For this sequence, the sequence of arbitrary points aye Q(t) 
is to be defined from which the convergent subsequences have to 
be culled. Under certain conditions, all the limit points belong 
to the set. Ay 


We introduce the auxiliary set 


B(t)={x€E": P(x,t)<p(t)f (ee) tv), %€Xe}. 
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THEOREMS in lneliectn. (Ca) ea ndet Dem enant yeti eto Nm (op me De 
continuous everywhere on EB"; het. the set of solutions X, of 
the problem (1.6.1) be HONS MDE Hs and let the set B(0O) be non- 
empty and bounded. Also, let the conditions (1.8) be satisfied 
Ota lee I,. Then the first simplified version of the penalty 
function method converges to X, globally. 

Proof. We show that the sequence of sets {B(t)} as t+» is 
contracting, i.e., we have’ the inclusion B(t+o) _<o-B(t)) fox any, 

& 2 0 and t ¢ J],. Let xX ,e.B(t+¢é); “then, using the condition 


+ 


(1.3), for any, x, € 4, we obtain 


w(t) f (%s) + v(t) =p (2) lf (x) +29 ]> 
>t [m(t+ 8) f (x) + E+ 8)]> os PG, t+2)> 
> p(t) f (x) +4 (2) S (x) =P (x, 2). 


Therefore, x <= Bt). In particular, B(t) ]© BCO)) for.all £20 


? 


the sequence of the sets B(t) is bounded uniformly in all 


areal: we have the inclusion xX 


4? , = BCt) <= BCO)}, “hence BC) 


for all t 2 0O is non-empty and bounded, X, also is bounded. 


The tuncrlone PCxSt ie rcontinuous! in —xeeattains ts manamum ons the 


colmipact set B(t), where the quantity p(t) = min” P(x,t) is 
xeB(t ) 


defined, with the conditions 
Pa) cats) ak th <p (2) F (Xs) 


satisfied. Thus, the set 2(t) also is non-empty and bounded. 


IOXCLNONSKS) aS) Aacl@alwrccucy Geese gyilil if ¢ I, we have the inclusion 


Q(t) < Bt). 
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By p(t) we denote the distance between the sets B(t) and 


p (¢)=max min |x—y]. 
© xe Blt) yeX, C19) 


7 
The fact that the sequence of the sets B(t) is contracting im- 
plies that p(t) is a monotonically decreasing function of t; 


hence, by Lemma 2.4.1, the limit. lim o(t) = a = 0 exists. . We 


t7o0 


DpLOves Phatelaa—sOL hors cach tl Ola taLGC DOSS blemtomdenitnes at 
least one point X(t) e« B(t) at which the maximin condition is 
So become Nm G@iiao) al hems C tm Onmal | MBC br oreall poss iplemvasies 


of t is bounded. Hence, from the bounded set of the points 


x(t,) we can extract a subsequence tx(tg)} converging to some 
point x, corresponding to the monotonically increasing subse- “% 
quence ite) c {t,}. In this case the conditions 
minjx—yl=a, lim x(t;)=x, x(t) € B (te) 
YEXs, & >@ 
Needs tombeusauustied) rOresa ll este. O, exit meu BCh I we have bhe 
inequality 
t (t) v(t) 
x(t cls 4 : 
FEMALES &()) <f (4) +20 
Thesratdo, Vit) /W(t,) st ends, torzero asi tao... Let te go to 
Mean Gye Len 
ECR teds 4. f(x 2) 0% Cielo 


Here we have used 


Pte t(j) 


tem P BD 4) 


S (x(t). 
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Letiied #30" tihens the tact thhat 


we t (t-)/p (tz) = 00 
——>@ 
k 

e 
yields S(x) = 0. But the properties of the penalty function 
imply in this case that x « X; and from the inequality (1.10) 
we obtain that at the feasible point x the value of the object-— 
ive function f is strictly smaller than the value of f(x,), 


where x, € X 


* which contradicts the definition of the set X,; 


«> 
hence the case d>O is unfeasible. The quantity ad cannot 
be negative since the functions “«1(t), wat), SCx@)) are nonne— 
gative for’ 7% .2°O;* hence “di= "0," f(x) = iG); ‘x = "ky: @fhere— 
fore 

lim p (tz) = minfk—y|=a=0. 

raat yEeXx, 
bub sit for the monotonically increasing stunct ion .o(b) thessup— 
sequence p (tp) has been found for which the limit equal to zero 


exists, then the subsequence e(t,) has the same limit. This 


implies that lim B(t,) = X,. For each t 2 0 we have the inclu- 
G ke? 


sion “(t) ¢ B(t), hence the convergence of B(t) to = 


implies the convergence of “Q2(t) to’ X,o > /// 


4. THE SECOND SIMPLIFIED VERSION OF THE PENALTY FUNCTION METHOD 


The calculations made by the formulas of Subsection 3.3.2 are 
Simpler than those made by the initial scheme, but they have a 
very essential defect: it is frequently difficult to guarantee 
that the unconstrained minimum is found with the specified degree 


of accuracy. Hence the accuracy of solution of the problem (1.4) 
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will be verified in a different way. For each fixed th the pro- 
cess of minimizing the auxiliary function P(x,t,) terminates as 


soon aS some point xX} satisfying the condition 


7 


Ps (%e> te) = Ne (te) Fe Xe) +t (tx) Se (Xe) IV (Ee) (1614°) 


has been found. Here we need to assume that the functions f(x) 
and S(x) are differentiable, however, the condition (1.11) is 
eaSily verifiable. This technique has been suggested and justi- 
Ped andependently.) by Several authors; “Mi Ay) Kostina, [2] > 
R. Mifflin [1], Yu. G. Evtushenko [9]. Following’ the,lines of 
[9], we state and prove a theorem on convergence. 

We assume that the penalty function S(x) is representable 


in the form (1.1), where the function o4(q) is continuously dif-7 


ferentiable over q « Ee and such that 
CGS pag). Se e0 12-904 =, 0 
GAL ales 
OCG) Ore Coe | Une, at 7 0 


Livtthemrunctiony Onn Jcesubsuitutedad ne Cl) eiethescondat Lonss for 
the penalty functions S(x), 5 asiiormulated. an Subsections 3.3. 1, 


will be satisfied. Let 


i — T (te) p' (g! (xz) . ‘ 
. (te) Clie), 
of, x Tee) 9" (Hi, (#4) 

H (¢x) : 


pelted], cee 


THEOREM 3.1.2. let the functions f(x) and S(x) be continuous- 
ly differentiable on ED, let the penalty function S(x) be 


separable, and let the conditions (1.8) and (1.12) hold. We 
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assume that the constraints g(x) = 0, h(x) < 0 satisfy the 
strengthened constraint qualification and that the sequence {x,} 


obtainable from (1.11) is bounded. Then the set of limit points 


of the sequence (x, sUy Vy) ast, + © is not empty, each limit 


point is a Kuhn-Tucker point for the problem (1.6.1). 


Proof. Using (1.13), we write the gradient of the auxiliary 


function as 
Pg (Xs Ey) = (tp) [Fe (Xe) F Bx (Xp) Up Ax (Xp) Yel 


Va ZO ee TOrm anole hee 


The sequence {x,} is bounded, hence we can extract a sub- 


sequence {xe} converging to x. We show that the point x is 


feasible. Assume the opposite: x ¢ X. We divide both sides of 


theminequality, (Giall)) by t(t,) and, letting ty +o, we obtain 
that 
ée c 
& 9’ (g! (x) a (*) + 3 9! (4, (x) Al (x) =0. 
f=2 {=i (1,,14.) 


Now we take the dot-product by vector 2 from the definition 
1.7.5 of the strengthened constraint qualification. Simple 


transformations give us 


Do (ei aie) + BD _o' (hh) M(x) <0 , 


ye (1.15) 


where we have introduced the index set 
A(x)={jE[l:c]: h/ (x) S0}. 


At the same time the conditions (1.12) imply that the function 
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¢(q) is convex and 9$(0) = 0, hence 


@’ (g! (x) g(x) 0,’ (Al, (x) A/ (x) 0 


formally a < [i:e}, 3 = M(x}. Comparing these inequalities with 
(1.15), we come to the conclusion that for the i and j indi- 


cated above the equalities 
@’ (g! (x) g! (x) = 9" (hd (x) A/ (x) =0 (1.16) 


are satisfied. It follows from the condition x ¢ X that we can 
find at least either one i such that a #G0,0 VOTe 12 le) 
such that hd (x) > 0. But) then, by (1512). either 
o'(e°(R)) x e°(X) > 0 or $'(hIG))bI (KR) > 0, which contradicts 
CHES) Hence xe =X. 

Next one roue the boundedness of the sequence tu, ,v,t. 
Assume the opposite and let 


t=1 ja t,o 


We divide both sides of the inequality (1.11) by w and pass to 


k 


the limit as t, +o, Then we can find Oss B, Cal, ea Seri. 


j <¢ o(X)), of which at least one is not zero, and such that 


Sae@G+ S pAl@=o. 


t=1 jeo (x) 


This contradicts the linear independence of the vectors gi (x), 
nd), Gseiiite ky jiaoCso. Hence a@llsthowlamit pointe of the 


sequence {X,,U,,Vzh ane finite. If forssome 3 <« [1:e] 
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lim A/ (x,) =h/ (x) <0, 
bye 


then, by (1.13), we have lim 6 = 0. Hence at the limit points 
Typed) 
k 
we have the complementarity condition, and each limit point of 
6 
the sequence {x,,U,,V,4 is a Kunn-Tucker point. /// 
Let “(126.1) bea cofvex programming problem. Then it fol-— 


lows from Theorem 1.1.7 that’ the auxiliary function P(x,ty is 


convex in x and, by Theorem 1.2.6, we have the inequality 


BAL, Hj? (%55 i= <PE A; a) Ni Xige 


Using the Cauchy-Bunyakowski inequality and noting (1.11), we 


obtain 


P(X» ty) —B (te) Fe) < v (ty) | Xp— He] 


} is bounded, it then follows that as t, + © the 


Because {x c 


k 


function 





t=] 


ae |S @ (g! (x,)) + 2 p (Ah c)| 


ot) 
uct) 


quantity in the square brackets tends to zero. Hence x e X. 


is bounded as well. But the fact that 





o implies that the 


Thus, for convex programming problems, in order to justify the 
convergence of the method there is no need to invoke the strength- 
ened constraint qualification -- it suffices to introduce Slater's 
or Karlin's constraint qualification and require the vectors 

go (x), hd (x) betlinearly independent, “atest {ive} Wt letsacry, 


< eumeer 
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5. THE THIRD-MODIFICATION OF THE PENALTY FUNCTION METHOD 


The two modifications given above simplify to a certain degree 
the computations; however, .for the problems (1.6.1) in which the 
vector x has large dimension, repeatéd solving of the auxiliary 
problem (1.4) presents considerable difficulties. It is easier 
to implement the method if one gives up solving the problem (1.4) 
and, instead, seeks the limit points of the solutions of the 


following Cauchy problem: 


d 
=P (x, t), x(0)=Xp, Gi 7) 


where P({x,t) is defined by the formula (1.7) ..,,dnis version of 


the method is similar to the Cauchy method (2.2.3): the differ-—- 


ence’ isithatr now. w(t); 74(t) depend on t5..'so. that the system “ 
(1.17) is non-autonomous. To justify the convergence of the 
method we introduce the set 

B(t)={xEE": P(x, t)<p(d)f (*)}- 
We assume that the functions u(t), et) are continuous and 
Sacelsik yar Olu, aGene ry the conditions 

w(t) >0, t()>0, Sp(idt=o. ae 

0 

THEOREM 3.1.3. “(Let f(x) and S(x) be convex functions contin- 
uously differentiable everywhere on ae Hers theys eta 5 4 be -non= 


empty, let the set B(0O) be nonempty but bounded. AS Of het 


the continuous functions u(t), Tae Sor 12 Wao 73 satisfy 


“is laloy) 


the condition (1.18), and let the ratio u(t) +o be monotonic 
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as t > ©. Then the method (1.17) converges to X, globally. 
Proof. In the same way as in proving Theorem 3.1.1, we can show 


heresit hat for anya, se 1 20 we have the inclusions 


+? 6 
Koco Bio) < Bit i ence x is compact. We introduce the 


function 


w(x) =[dis (x, X,)]¥=min|x—y|*. 


Next, we differentiate the function w(x(t)), using the system 
Civ jo and uleor.exploit (the convexity cot (x,t), in’ x.) we 


thus obtain that for any x(X>,t) ¢ B(O) the inequalities 


1 dw (x) 
 Wdt 





<U(L) fF (%e)—P (Xo f), 1) <0 


are Satisfied. The function w(x) is bounded everywhere on B(0O). 
The inequality obtained implies that W(x(X_,t)) is a nonincrea- 
sing function of +t everywhere outside B(O). Arguing in the 
Same way aS in proving Theorem 1.1.2, we can show that w(x) is 
an infinitely large function. Taking into account Theorem 2.2.1 
and Remark thereto, we conclude that the system (1.17) is 

Lagrange stable, i.e., each solution of this system is extendable 
(to the right) as t +o and bounded. 


Now we use Theorem 2.2.7. Let 


Go = {xe« di dean p Ser aloes 
oC) = inf LPR tort £ (3,94 
xe EG, 


Then for? an arbitrary © 590 we have the estimate 
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SupRAN(X)) Sr 2u6t)OCt) 
xeE"\G 
Ee 
ltwieweasy, to. show that) forsany se > OO there exists® te) 
Suchet hater pct) =e a for all- in 22 (ED), AwM@ukee) NG) 22 1) Girene 
7 


the same values of t. Since 6(t) is an increasing function of 


Oil ie We) paveN have 


=v OCT)? SS =n(tyectte)) <0", 


it it, 
-lim if u(t)o(t)dt < -—lim O(t(e)) ri uc(t)dt = --~ 
tro to 

te) Be) 


All of the conditions of Theorem 2.2.7 are satisfied, implying 
thereby that the method (1.17) converges to the set X,. /// 
Numerical implementation of the method (1.17) causes diffi- 
Culbvesrdue: to the Gholee OL tie Lunetions l(t) mt Gt) oe Hence sss 
it is not recommendable to use this method for simple problems 
involving small dimensionality, which are solvable through the 


penalty function method invoking no simplifying techniques. 


2. ESTIMATION OF ACCURACY 


ie EXACT PENALTIES 


There is a whole class of penalty functions having the remarkable 
property: for each fixed, sufficiently large value of the coeffi- 
eaent «, the set X, of solutions of the problem (1.6.1) coin- 
cides with the set of solutions of the auxiliary problem (1.4). 

We say that these functions are exact. The utilization of exact 
penalty functions permits us to reduce the constrained minimizat-— 


ion problem (1.6.1) to the unconstrained minimization problem (1.4). 
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I. I. Eremin was the first who noticed in 1966 this property for 
convex programming problems [1], [2]; then W. Zangwill [1], 

VitD. Skarin’ (1); Ti Pietrzykowski [2], C. “Charalambous [1], 

S. Han and O.Mangasarian rat. and many other authors studied this 
question. 


Let us introduce the auxiliary vector functions: 


F(x)=[lg*(x)|, ---> lee (*) |, M(x), «+, AE )], 
Dix) = fez), Se. Petz Ab) ee ee); 


mapping Ro into Rae and combine the dual vectors, letting 


yo= fu,vj < RU. If the problen (1.6.1) has a solution x, =x, 


: e c : 
and the corresponding dual vectors u, ¢«R°, vy, « RL, we write 


the combination as y, = [u,,v,] < R 
We shall show in the sequel that the Holder norm of the vec- 
tor F(x) is an exact penalty function (see Appendix II). The 


auxiliary function then has the form 


BCC thes  $G) et ACO Wes, Goat) 


IA 


where 1<sp o, We consider the set 


Z(t)=Argmin P (x,.+) 


x€R" 
dependins on 7, “as ona) parametver.=s bet 
pee Sale (2.2) 
here Ilya, denotes the norm of the vector Yer WOLCH Vs duet 


to the norm FOO I, from (25254 


THEOREM 3.2.1. In the problem (1.6.1) let there exist @ saddile 
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point) [x,,u,,¥,1_ of theyLagrangiantslL(x;ujv)-ovThen: for. any 


values tT >t, the sets xX Male) VAG)  “SOnlaoni(oksy, 


* 
Proor.,. ‘By Theorem 1.6.1 the point x, belongs to the set of 
Solutionusiix, moti the problem.(1£6.41);! satethe, point. [xyjrg) 


y 
the complementarity condition is satisfied. Hence for any x «© Re 


Pere) Ao) LN Xeoe ay Ue) Sadho gy 2 Uy), 


The condition v, 2 0» implies the inequality 


(n(x), vy) < CHeCO) ay gos using which we can find 


P (Xe, tT) Sf (x) + <8 (%), Ue> + <Ay (X), Ve = 
=f (*)+<Ye, D(x). 


Using the Holder inequality (see Appendix II), we obtain 


- 


P (Xe, T) SF (x) ANY lg IP (*) p= P(e) +1 Yelle LF IL, 


Noting that t, << and taking into account the definition 


(2.1), we extend these estimates: 


p(Lilm inser PGs ws. (2560 


This inequality holds for any xX © Raat T 2)1,, ence the set 


Z(t) is not empty and X,.¢ Z(t). We prove that the sets X, 


and Z(t) coincide. We assume the opposite, that Ser ORES OLS 
T4 > T, we can find a point xX, € Z(t) such that x4 eX: 
Then 

f)=P (x, t= P(e, Hr) =f (i) +1 YF () [- 
Hits Smee SM, lee XxX, € A,  sance in this case F(x) = 0 and 


f(x er f(x,)) If /x,7¢ X,_,vhen!! ||FCxy)il # 0 and for any 
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rT SE CURE we have the strict inequality 


1? 


P (Xe, =f (Xe) > FM) +T1F (¥1) |p =P (%1, 1), 


which contradicts the inequality (E33) Ae aelhts implies that 
x4 =X, ‘hence’ Z(v) © Xo. -Notdngi the, inelusion? foundsbefore, 


Xo 204) 5 we conclude that thersets, x Buel AC) C@uliaveakele 


* 
£OP Adee. of ff 

Bye Theorenma 6G. 7m themcondit1onmdne Dheorenacu2el, maior tae 
global saddle point of the Lagrangian to exist can be replaced in 
convex programming problems by the conditions for the problem 
(1.6.1) to have a solution, as well as by Slater's constraint 
qualification. In the general problem of nonlinear programming, 
the condition for the saddle point to exist in the problem (1.6.1) 
can be replaced by McCormick's sufficient minimum conditions. We 
have the following theorem. 
THEOREM 3.2.2. Let the conditions of McCormick's Theorem 1.7.2 
be satisfied at the Kunn-Tucker point [xp edigavagd for the pro- 
blentGS6e). then x, is? the’ local minimum point of the fune-— 
itl) Ti (CX) eh Orme ant © Claman Ty, Where t, is defined from 
C22 ye 
Proot. The function -L defined from (1.7.15) can be expressed 


al 


in terms of 


L,(x,y) = f(x) + (y,F(x)) 
For any 1t > Tt, one can find a vector y = [u,v] such that 
“ee eve lu, fom alla ive ht eherand 


TOS lvl, > llvallg ee (2.4) 
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By Lemma 1.7.5 there exists a neighborhood GCxy) of the point 


Mee) such that for all) x re.GCxy) s+ x Ax ithe conditions 
Cx ee oe L,(x4,U,Vv) < L, (x,u,v) (C2) 


are satisfied. 


Using the Holder conditions (2.4), (2.5), we obtain 
Beet Gu) tay B(x) tas SE) alii MECx) Lae = R(xy es 


We have arrived again at the inequality (2.3). The further consi- 
derations will be the same as those in proving Theorem 3.2.1. /// 
Even when the functions defining the problem are everywhere 
ditterentiaple, sthestunct ion @2. 1) sis notsdit ferentiablelat  boun= 
dary fpointssofythe feasible set! X. cHence “such auxiliaryofunc=s 
tions are referred to as non-differentiable; the use of them sim- 
plifies the numerical implementation of the penalty function 
method since there is no more need to let the penalty coefficient 
go to infinity -- instead, it suffices to solve the problem (Gal) 
COGlyeOnce Ole Tar yi. SUCH Ue quantity t, is usually unknown, 
and hence the problem (1.4) has to be solved for several values 
Of t= Abethe sane time, uhe use of exact penalties makes the 
auxiliary problem (1.4) more complex, since in that case one needs 
to apply for an unconstrained minimization only slowly converging 
numerical methods which involve no differentiability of the auxi— 
liary function. Hence exact penalty functions are usually applied 
in solving the problem in which the functions defining the problem 
are non-differentiable. 


These results are easily transferrable to the more general 
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case where the auxiliary function has the form 
PiGeyt ie =) ets Ox) ets Cece) es C226) 


wheres BCx,2) isia continudus function of both arguments, 


satisfying the three conditions: 


ot, Bexir oe TIIFCO IL, for all x «R", t> es 


02), BCX, Tt) (Rt Oc La te 


O13 EE GX.) em Sm Sta Crs Inve Ime asain Oued cen © 1m), [eae 
BC. SAT gs 
The following functions, for example, satisfy these condi- 


talons: 


B(x, a) =e VF 
B(x, t)=t]F (x) [,(1+a]F(x)|,), a0. 


Chesasse Git Oncol ineOrmems Boal an Geo meme inn vianlelC met 
Ghew Mine lwoOn P(X) has the sormeCcas6nir 

Next we consider some particular cases. We assume that 
B(x,7) = TIF CO) II. If p=i1, then q=o, and the auxiliary 


funetion and the quantity t, are defined by the,formulas 


Pr(x, D=f()+1S(2), Sle) = S]ei(o|+ Valw, 
ra ie (2.7) 


T= max max [lus|, vb]=|yeln. 
ie[l:e] je[l:c] 


We considered minimization of P,(x,T) (see the Hormulasn(@i 6.12) ). 
starting from different considerations. 
The initial and dual norms coincide in the case of Euclidean 


norms when p= 1 = 2: 
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‘ e c 1/2 
Pe, J=f@tt| Seor+ Smo]. 
e c 1/2 
mam | Bue Behe] =1yle 


cin 


If we put p=», then ‘q = 4, 


, 


P(x, t) =f (x)+1S* (x), 
t= Di lut|+ Dollyeh , 
= j=l 


where the non-separable penalty function (1.2) is used. 


Using the relation between distinct norms, given in Appen- 


<= 


dix II, we obtain that Tey © Teo © Te9° 


Therefore, among the three auxiliary functions above the 
function P,(x,T) has the smallest minimal penalty coefficient, 
the function Po, whose penalty function is a Chebyshev norm of 


the vector F(x), has the Jargest value’of T,.- 


2. ESTIMATION OF ACCURACY 


From Theorem 3.2.1 we can obtain an important result enabling us 
to estimate the errors arising in using the penalty function 
method. 

THEOREM 3.2.3. In the problem (1.6.1) let there be a saddle point 
Ley Ofathe Lagransion *L(x,u,v). Thenstor any 7 > 0, 

p> i, ijp + 1/qg7= 1 we have the estimate 


2 


Iya! 


fC ae Ke [F(x |B Dials 





Proof. This estimate is well known in the literature for the 


ease “p= q = 2. It is proved, however, only for the case of 
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convex programming, uSing some cumbersome computations involving 
convexity and differentiability. The result formulated in this 
theorem is of a more general kind and its justification is 
extremely elementary. Indeed, in proving Theorem 3.2.1 we ob- 


e 
tained the inequality 


f(x.) < f(x) + [lFGOIIpMvall, - 


Using the well-known inequality 2ab < pe with 


avate sally lljes buemys 2 tal PERT G, M0 3 
we immediately arrive at the required inequality. /// 


The technique of our proof can be developed further. We use 


Young-Minkowski inequalities: 
* 0 
(a,b) < 4¢(a)i+ @(b)ty (ab jaws 1062) de (0) xn 


* 
Here a,b « ao @ (b) is the conjugate of (a), Ocha is the 


pOlAarsOLme tae function een Ca)): 


OD )e sie supe (ye, be eoca) yee 
acA 
vb) Ss yinfeap : (a,b) <UY Cava VideecA} 5 
where the set Ae Roe The functions “@*(b) and v°(D) are 


defined to be the "best" functions satisfying the above inequal- 
ities; we cannot replace: $*(b) and es by any other func- 
tions of b, to strengthen these inequalities. For i) > al elaEXSE 


inequalities yield the following estimates: 
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ns 


(a,b) 


ib 1 
AifalP + Aipilo , 


(a,b) = flallpllbllg 
Sawa i i 
CRO eee lost ate ORC chr r) | 
i=1 
The second relation isa Holder inequality: *Fore oe <"1,*°a #"0, 


Dea O hea Om wemhavie 
(a,b) 
(a,b) 


IA 


1 a als B 
ahaa = gli bile , 


Zs 
a 


IA 


+ 


DH 
ll 
H 


-llall,llbll, , 


Here we need to explain in more detail the use of the sign 


of the norm for'ea*< 0)! “The first condition for the norm, 


, 
||Ol|, = 0 is not satisfied herein. Hence the [lall for “ae R® 


should be treated in the formal way, starting from the following 
definition: 


y/o 








lal, = 


Ss j 
meee 
itestal 


Using the inequalities given above, we obtain 


q 
f(xy) - ZIEI <= £00 + slltFooll, 


If the problem involves no equality-type CONnSsira ints sien 


we have 
ec : F eG al 
1 ad h 
DCs ean - in vy] apt ace T 2° Ca P(x,t) ; 
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where the function P(x,t) does not satisfy the conditions im- 

posed on auxiliary functions of the form (1.3). For this func-— 

taLOn we CHes edie lt ym CxX)—— P(X. e) see Se NOE Saisie eC mone ih Cmiuere: 
e 

Sible Set, but only in the limit as 1 > @. Nevertheless, this 

class of "nearly penalty functions" can be successfully employed 


in solving nonlinear programming problems. 


We may add Lagrangian to the auxiliary function. For ex- 
ample, Jet 
2Cy 0) =" Are mrs nbCK. ¥, ton Va pee 
xeRN 
P(X va Gets 28) Au, gC ee (v,h, (x)) + TIEGO II, . 


Suppose in the problem (1.6.1) there exists a saddle point 
[Xs Uy. Vy). .0f the Lagrangian and also Tt > lly -vellg - iis 
easy to show that then x, ¢« X,, .f(x,) = P(x,y,.7) for any x 
and,etheretore. X7.= Z(y.a0)+. Thus, sthe use of Lagrangian multi- 
pliers enables us to create a new class of exact functions. From 
the computational point of view this result is quite useful be- 
cause it demonstrates that even an approximate value of the dual 
vectors u and v allows us to decrease the minimal penalty 
coefficient (2.2). This result implies as a particular case the 
assertion of Theorem 3.2.1. Indeed, letting y = 0, we arrive at 
BACB rUNnCchUOne (2): Lied neu her torn lamttonmma 2 ae wer take (v,h(x)) 
instead of Cay Cx0.)9 to satisfy the given properties we need 
to add the condition.» v, 2 v. 


Similar estimates can be obtained for distinct combined pen- 


alties when the scalar products (u,,@(x)) and (Vy ERS are 
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estimated through different formulas, The concepts of the con- 
jugate and the polar functions are quite useful in studying non- 
linear programming problems.. For example, crucial estimates can 
be obtained in a simple way, without cenmane the convexity or 
differentiability of the functions defining the problem. Further 


elaboration of this approach will be found in Section 3.4. Let 
= 2 
P(xoe oP = eit) 4 ai CS (2.8) 


be the auxiliary function. Here and throughout the next section 


we use the Euclidean norm. Then 


P(x, 7) =f (%)—z Pp (ui)? + 2 (oly i (ED) 


s 
This implies that Z(t) > -X 5 asiat 21%, ednipot her words, the er- 
ror in defining the value of the function being minimized when 
the penalty function method is used tends to zero in proportion 


Omit mn Ue iantzel: ba yann/cTare 


3. DIFFERENTIABLE PENALTIES 


We consider the problem (1.6.1) where the feasible set is defined 
only via constraints of the equality type. We assume that the 
penalty cocrficient is sufficiently large (and “‘e@= 1/1 <\can be re- 
garded as a small parameter. Let the second simplified version 
(1.11) be implemented in which the point x(t) satisfies the 


condition 
P(x, t) Sf (%e)+<fg (Xe), X—%e> +4] F (x) |. 


e 


Introducing the auxiliary vector ueE and the vector 
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ae Ewe lal] < 1, we rewrite this condition in therformiofia 
system of n+e nonlinear equations 
fe (x) +2. (x) u = ea, 
2g (x)—eu=0. (2,40) 
Now we investigate the dependence of x, u on the small parame- 
ter ¢€ > 0, generated by this system. .Assume that in the ini- 
tial problem (1.6.1) there exists a Kuhn-Tucker point [x,,u,] 
at which the sufficient minimum conditions given by Theorem 1.7.2 
are satisfied, and also the constraint qualification holds at 
the point x,. 
We shall seek the solution to (2.10) for e > 0, using the 
formal power series expansion: 
x (8) =x,+ex,+e%x,+... 
u(e)=u,+eu,te%u,+... (2511) 
We substitute these series into (2.10), assuming that the func- 
tions f and g are differentiable as many times as required. 
We expand f and g ina series of powers of «¢. Equating the 
expressions for equal powers of ¢€, we obtain a system of linear 
equations for defining the coefficients in (2.11). In the zero 


approximation we have 


L, (Xe, Us) =f, (Xe) + Ex (Xe) u.=0, g(x,)=0, (2.12) 


Chatwis, eve OpLaan es tneskuhn—Tuckercond 1 tions pe Omm onion i alninsts 


power terms we obtain 


Lex (Xete) Xi + 8, (Xe) U, =a, 


T a 
28, (Xe) X14 Ue er 
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ATOM Wha C he Wem taal: 


Xebery = %et elxz (Xe, Us) [4@—8x (Xe) Wi], 
Ug eu, = Uae [2g7 (%6)-Lz} (Xa, Us) x (Xs) 7? X (2.14) 
x [2g (x«) Lx (Xe, Us) a—u,]. 


Let us investigate this approximation. We compute the values of 


cc Cl mee Ou 


f (Xe ex) =f (%.) bef? (%.) x, +0 (2%), 
g (Xe &X,) = 2 (Xe) + eg (x.) ¥, +O (2). 


Wistine mt Clee 2) rn C2. lo) ee Wie BODiua tn 


f (Xe oxy) =f (%e)— 5 | Mel? +0 (e%), 


8 Di remelee) 
g (tat ex,) = 2 w+ O(c). dee 
For the auxiliary function we have 
] & 
P(xe-bex, =) =F)\— Flas P+ (0). (2.16) 
We have arrived at the estimate close to (2.9). From the 


formulas (2.15), (2.16) it follows that the, points x(e) are such 
that the computed values of the auxiliary and of objective func- 
tions are smaller than the values of t(x,) by quantities of the 
order ¢«. The signs of the components of the vector ex(aCe yy) 
coincide with the signs of the corresponding components of the 
vector wu,. The choice of the concrete point x satisfying the 
condition (1.11) has no influence (up to second-order terms) on 
Che WalweS or GS) each ies). 


We show that the formulas (2.14) give an asymptotic approxi- 
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mation ofathe solutions’ x(e)i and u(e) “of the systemt(2910) as 


e7a0 Sup to second-order terms, aie. , 


; e en : 
Me 1x (e)—x.—ex | <0o, ie aa |4 (e)—4.— eu | a (Qal7) 


THEOREM 3.2.4. Let [x,,u,] denote the Kuhn-Tucker point for the 
proODsenm Cle Gn) pembetaa Gans Nem DOdunit aes = x, the constraints 

g(x) = 0 satisfy the constraint qualification; let the functions 
defining the problem be twice differentiable in the neighborhood 


Ole the point =x where their matrices of the second derivatives 


*? 


Satisfy a Liptschitz condition, and let the matrix Den te 
be non-singular. Then the formulas (2.14) yield an asymptotic 


approximation of solutions of the. system (2.10) as ¢«> 0 up to 


2 
eae 


Proof. Let 


x(8)=xX.--ex,+6x, u(e)=u.+eu,+8u, 
6z=[dx, duJeE"*e, z=[ex, +x, eu, + Ou] E Ente, 


Using the differentiability property, we can write the system 


CZO as 


Me (x) +2, (x) u =L,., (Xe, Us) (ex, -+ 5x) + 
+8 (Xe) (Ue + eu, + 6u) +, (z) = ea, 
2g (x)—eu = 2gF (x4) (ex, + dx) —e (u,+ eu, + bu) + (2418) 
+ Ya (2x, + 8x). 


ec 


Here the functions ¥4(4), Yo (x) AGee SU Chet hia) taenO ra Al eZ co ; 


x « E” we have the inequalities 7 (ys ce, llzll?, Yo(x) « call x IP, 


where Cy, Co are constants. We can justify this representation 
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in the same way as we prove the formula (5) in Appendix I. Let 


as =i (2) = 2 
1) er acennla “1 lat : 
Then noting (2.13), we can write the relations (2.18) as 
5z= N-?T (ez, +82). (2,19) 


The matrix nt is bounded; hence there exists c_ such that 


lIn-*P (ez4 + oz I < dlez, + dzll?. 


a 


By Lemma 2.3.2 the equation (2.19) has a unique solution if 


2—1 
elajcV—. . 


iteisealwayss possible tonobtainethis,inequalatyeint Vey as ssutii— 
ciently small. Then the solution $§z obtained from (2.19) 


Satisfied the condition 
| 62 | < 2ce? [| x, [+] 4,1]. 
Or, returning to the primary functions, we obtain 
obs [lx ()— ¥en | +] 4 (2) a. — ere I] <td, 


where the right-hand side of the inequality does not depend on e. 
Passing to the limit as © > 0, we obtain the conditions (2.17), 


proving the Theorem. /// 
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4, EXTRAPOLATION IN THE PENALTY FUNCTION METHOD 


The results obtained in the previous Subsection are quite useful 

for a qualitative study of the method. At the same time the ap- 
e 

proach can be exploited in the numerical implementation. If the 


quantities oy, = + change only slightly from one iteration to 


the other, an extrapolation of the vectors Xp is useful. Let 


the second simplified version be implemented for a known value of 


Xp satisfying the condition (1.11), Let 


a) =Fe(%e) += Be (Ke) & (ee) = Oe: G22.0n) 


is (xp - 


Here lay. || <1. For the approximated dual vector we take 


u, = sae Cee 
: es 


We introduce a small parameter wu and consider the problem 


of solving the system 


fe (*) + 8x (4) U = (€,—) a, 


2g (x) = (4 —H) u. ae 
The formal power series of y is: 
x (pw) =x, +x, +x, + sey 
u(p)=u,+pu,+p7u,4+... (29230) 


Substituting (2.23) into (2.22), we make a power series expansion 
in wu. Equating the expressions for equal powers of Lu, we ob- 
tain a system of linear equations for defining the coefficients in 


(2.23); For su -= 0 “wev-obtaah (2.20)". (2.21) and next: 
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Lx (Xp, U,) Xs + 8x (Xp) ly =—az, 
287 (Xp) X1— gly = — Up 


Having solved this system, we substitute into: (2.23) the expres- 


sions for Xq) Uy and obtain ; 
X (HW) = X_—wLis (as Mp) [44+ 8x (%4) Ms] +0 (H4), 
U (WL) = Uy + pu, + O (*), 


a= [eq 2gF (X,) Lik (Xp, Up) Se (Xp)]7? X 
xX (Ug— 2gh (X_) Led (Xgr Up) Ay): 


Assuming u = ey - Chad? we find the approximated vectors Xpay> 
Ua the former can be taken as an initial vector in finding 
ite Een De Tet 0 CT uae) eT) ee EU) 1f the computations via the 


penalty function method are terminated upon finding the vector 


Xp then, for an approximate solution it is appropriate to bake 


instead of the vectors Xp, Uy the vectors X44 and Ud 


obtained from the formulas found via an extrapolation to the 


point, Ut =e 


is 
Xpri=Xp— Lie (Kp Ue) Le (Xr Ux) 8x (%e) 4], (2.24) 
Ups =U, +u, 

u=[gl (x) Led (Xp, Up) Bie (X%_)]7* X (2.25) 


X [E (%_)— Be (Xp) Lik (Xp, Uy) Ly (Xp Uy], 


where the following representation of the auxiliary function gra- 


dient has been used: 


1 
Pe (tm Te) =Lelin 4) = Hay 


the validity of which follows’ from (2)..20) and 9(2.21);} furthermore; 


we have noted that ey is small, which made the expression for 


uy samp ers 
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It is possible to abstract from the extrapolation problem and 
regard the formulas (2.24) and (2.25) as some numerical method for 
solving the problem (1.6.1); in this method neither the penalty 
function nor the penalty anetticient 1G OSC Die Cesta ee (eel ome alorinlny 
seen that the formulas (2.24) and (2.25) define nothing but New- 
ton's method applied to solving the system of nt+e equations 
L, (x,u) = 50% —se\(= )y=0). 

Methodsmot sie mLorne ¢2.124 (2.20) awl beNOplan neds une ne 
next chapter on the basis of quite different arguments. The ap- 
proach presented above points out the relationship between these 
methods which seem to belong to quite different classes. 

From a computational point of view, the use of extrapolation 
is justified if to solve the auxiliary problem (1.11) the methods 


defininesthe matrixes have been used; then the con- 


xx Sr Uy) 
VeErSlOn) Ole exe and. UsalsS NOL di fLicult. s Otherwise. the extra 
polation can be simplified. According to the results obtained, we 
can assume that the function x(yu) depends linearly on u, and 


hence for — x we take the vector 


k+1 


Sear e 
Xy4a = Xp (X,—Xp-1) a A 
&% — Ep-1 (2.26) 


5. CONTINUATION METHODS 


We assume that the system (2.22) for each u « [0,e,] has the 
solution [x(u), u(y)] depending continuously on uy. As the 


parameter u goes from O to ues. SOWA opal Tixey (2), 2h) - 


Ey: 


[x(u), u(u)] describes some spatial curve in Eye 


» one end- 


point of which is [x(@), u(0)] being the solution to the system 
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(2.10), the other endpoint being ‘the Kuhn-Tucker point 


[x(e,), ule, )] = [x,,u,]- 


If the conditions of Lemma A eae Satis med t he mata 
bad 


N (Xe, Us) = err aeal 


is non-singular; and by the theorem on implicit functions, for 
sufficiently small Ey the system (2.22) has a unique solution 
passing through the point [xee Uys moreover, the functions 


x(@), u(t) are differentiable and satisfy the following system 


of nt+e ordinary differential equations: 


dx du 
Lyx (x, U) dy +g; (*) gr = — Fs 
dx du (2.27) 
di 
282 \*)agpoee Ol =o) spe # Ae) , 
Integrating this system over yw from w= 0 to y= ey with 


the Cauchy initial data x(0) = XQ» u(O) = Ug» where Xo, Uo 
denote the solutions of the system (2.10), we obtain the Kuhn- 
Tucker point sought. 

The procedure described can be regarded as a numerical method 
of solving the problem (1.6.1). The method consists of two stages: 
an approximate minimization of the function P(x, l/e) in x and 
a numerical solution of the Cauchy problem for (2.27). In con- 
trast to the method (2.2.3), this method involves integration of 
the system (2.27) with high accuracy; hence if the Euler method is 
used, one needs to take a sufficiently small step of integration, 
or to use more exact methods (the Euler method with conversion, 
the Runge-Kutta method, and others). The integration errors for 


(2.27) entail the situation when the solutions x(y), ut) do 
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not satisfy the system (2.22) any longer and an additional cor- 


rection is required. 


This approach has been porrowed from the methods for solving 
nonlinear equations, known as "continuation" or "homotopy" method. 
A more detailed description of this method can be found in 
G. -Ortegs and W. Rhéeinboldt.[1], and in Davidenko [1], [2]. This 
approach has not, however, been widely used for solving optimiz-— 


ation problems. 


3. THE COST FUNCTION PARAMETRIZATION METHOD 


In this section we shall describe a class of methods for solving 
the problem (1.6.1), close to the penalty function method, based 
on the parametrization of the cost function; the main difference 
is that in the former method the auxiliary parameter changes auto- 
matically according to the rule prescribed, while in the penalty 
function method the variation of the penalty coefficients has to 
be preassigned. Various versions of the method are given in 

Boy) Pshenichnyj [1},. °VoOV. Velichenke [1]; -f31. (4), Yuuep, 
Ivanilov [1], B. S. Ragumikhin [1], D. Morrison [i], F. Lootsma 
[1]; V. V. Ivanov and V. A. Lyudvichenko [1], and many other 


works. We shall dwell only on a few versions of the method. 


1, PRELIMINARY RESULTS 


Suppose we are solving the problem (1.6.1). Using the penalty 


function used in Section 3.1, we make up the auxiliary function 
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M(x,n) = (f(x) - n)? + S(x) 


We introduce the set 


Arg min yM(x, n) 
XeEN 


X(n) 


EEMMA 3.3.1. If x, = Xy, PCS S) er Ty X4 € IGA  Wwlalern 


koe) 2 f(x,). 
Proof. From the condition x, © XC) wer Oibt aan 
ome ie 
CCX) = 1) Son Cee 2 MOS. Tee 2 et him) 


isi £(x4) 2 iis wie wOIliGuisy steer CBee) qulaeke 
a eK eam ez tix, Ty = nt eo 


We then have that f(x,) 2 f(x). Lt £( x4) <i, ‘wise 


(3.1) 


(372) 


(3.3) 


£Cxs) ee £(x,); thus, we have arrived again at the required 


DMC CUA aay yaw eet) f/, 


LEMMA, 3.325. Let. . n=. i¢Cx, ),y (xy!) €.% 


* then xX, = X(n) 


*? * 
M(x 5,7) = 0. 


n 


IO Ole ae Olea oye unc me Ly we have the relation M(x,,n)= 
implying x, « X(n). 
Conversely, let X, ¢ x Gin Inethis cases tor any 
Oo Cx, a) = M(x, a) 
ny partacudar, 
@ < M(x, ,n) iS MCL th) = 0 


This=1s possible only it S(x,) = Ope haem Ss nem pO denis 


feasible and, in addition, such that £( x4) Se as 


? 


O <M(x,n) 


n 
x € EF 


xy ES 


hence 
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x57 5 xX. We then conclude, that = CGD) ane 
LEMMA 3.3.3. Let Ny < No, X41 & X(n,); Xo € X(No)- Then 
f(x) < £(X5). 
Proof. For any x we have the inequalities 
O< (Ff (%1)—1)? +S (41) < (Ff %)—m)? +S (x), 
0< (f (%2)—N2)? +S (2) < (Ff (x) —N2)? +S (x). 


In (3.4) we take as x the point Xo» andeane (Som bhen ped mp 
Xy- We obtain 


(f (%;)—n, 2 +5 (*1) <(f (%2)—0:)? +S (Xs), 
(Ff (%s)—a)? +S (x2) < (Ff (%1) — 2)? +S (%;). 


Adding them and making some transformations, we obtain 


yielding the required inequality: £(x,) < £ (Xo). Hof 
It follows from the Lemma that the value of the cost function 
f(x), for +x = X(n)”* is*a’monotonieally ancreasing function of 


the parameter Mn. 


2. THE FIRST VERSION DUE TO D. MORRISON 


We assume that some lower estimate No of the optimal cost func- 
alo aiGe) de Uno, sho@o, No Set (x23 x, € X. We can obtain 
this estimate, for example, by solving the auxiliary problem of 
the unconstrained minimization (1.4); by (2.9) it is then possible 
COMmpUbe senha f(x). The cost-function parametrization method con- 
sists in sequential determination of the points x, and increas- 


ing the parameter nN, so that Dea fH ig By the lemmas proved 
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in Section 3. 1, X(n,) > X,, ensuring the convergence of the 
method to the set X,. The key point here is the rule of changing 
the parameter from one ‘itération to another. D. Morrison [1] sug- 


gests the following version. ; 


On the oe iteration, let some value of the parameter 
nk < £Cx be known. From a solution of the auxiliary problem 


(3.2) we find the point i= X(n,). i cee Xero 


k 
M(x,,n,) = 0, the calculations are over since in this case 
S(x,) = 0, x X,. Otherwise we put 
ici tie emer Sate 
and the iterative process continues. We prove the convergence 
s 


of the version described. 


LEMMA 3.3.4. If n, < f(x,), then SP ey. 


Meee 


Proof. For any x « E” we have 
M(x, ,n,) <S MCx,n,) 
In particular, for «= x, We obtain 
M(x7 my) = (£(x yon 
Ke ke + k 
But by hypothesis ne = £€x5);- hence 


V M (Xe. Ne) SP (%s)— Ne» 
Ter =e +VM (Xp Te) SS (Xe): /// 


THEOREM 3,331. ,in,the problem (1.6.1) let the set ,.X,. be non- 
empty, let the function M(x,n) be everywhere continuous, and let 


the value No < f(x,) be known. Then the version suggested by 
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D. Morrison converges to the set X,. 
Proot. By Lemma 3.3.4 the monotonically increasing sequence 
{n,3 is bounded from above by the quantity f(x,). Hence the 
limit exists: 


Lime HES =e | Koh SE Cg Oe eS (3.7) 


kro 


We consider now the second auxiliary problem of finding 


min f(x), X(A) = ixek : Siso-eouy . (3a) 
xeX(A) 


We denote by X,(A) the set of solutions of this problem. Ob- 
viously, X, = X,(0), xX = X(0). Let F(A) = f(x), where 


x € X,(A). We prove our Lemma, assuming that the function F(i) 


Ls right continuous; atthe point “1. =)0> wi ese for any, 2. >) 0 
thenenexd Stism om 0 ursuieh shabu igh lO sae op then 
[POo- 78 (0))) <-e. ) Here F(0) = (x45 oxy eke, 


To prove the assertion of the Theorem, we show that for any 


positive Ey» &5 We can sc maKel “IN TSOYels), oak Giewe AIL Te S Np Ke 


have the inequalities 
f(xy) = EC iy A) Sans S(x,).-< £5 . (3.9) 


For ey 2. O awe cake 4 0) Sasi haiti alsa mene me O45 then 


|F(A) - F(O)| < €,. It follows from (3.7) that there exists N 


SuchmuhateenOn sel) kh en Nee hem conc talon 


Ores Nati— 1, <min[Ve, , ) 5, | 


is satisfied, yielding 
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VM (Xp 1) <min[Ve, , V5] ’ 
(F (%_)— Ng)? +S (x) <min [e,, 6,], 
S (x) <min[e,, 8]. 


Therefore, S(x,) < 6 forall k >-N. Hence 


HL 
[xy 2d <€, for all k > N. Noting the assertion of 
Lemma 3.3.4, we arrive at the inequalities (3.9), thus proving 


the Theorem: /// 


3. THE SECOND VERSION DUE TO D. MORRISON 


In his article [1] published in 1968, D. Morrison outlined one 


more version of changing the parameter n;: 


- M (Xps Ne) 
Netra f(xk)—Ne - (3.10), 


Later, many authors studied this version. One of the first among 


those works was J. Kowalik, M. Osborne, and D. Ryan [1]. 


4. A VERSION DUE TO B.S. RAZUMIKHIN 


In the book of B.S. Razumikhin [1] the following rule for defining 


the parameter is suggested: 


Thess Lead > (3.11) 


5. COMPUTATIONAL ASPECTS 


Let us compare the three versions of the method. We denote by 


th 


Ay; Ao; Ag the variations of the parameter n on the iter- 


alron/accordang torthe -rudesi( 32'6).," (3210) and’ (3311)," respec 


tively: 


M ’ 
Ay=) M (Xx, Nr) ’ A, =e Me) A,=f (X,)—N: 


f (*z)—Ne ’ 
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~ turthermore, irom the represen— 





It is obvious that [a5 | > |b, 


tation 


A, =A, +L EGE) YF) — mFS Oy) LO) —nal] 


duet OLVOW Se cb eiber ct Ons f(x), ) > mh we have the inequalities 


Thus, the auxiliary parameter mn increases maximally at each 
imeravion for the version (3 lO)bninimally for the version (3.12). 
Hence one might expect that the version (3.10) has the maximum 
rate of convergence. The numerical computations confirm this 
suggestion; at the same time, however, the numerical tests make 
one notice the following. The properties of the cost-function 
parametrization method given above were obtained under the assump- 
tion that the auxiliary problem of minimization of the function 
M(x,n,) has an exact solution; because of numerical errors we 


Gan wobitaan > £(X,): In that case the further calculations 


"i 
aecordinestovueithersot the formulas (3.6) (3. 10).) (3.441) wall 
make no sense Since they are justified only under the assumption 
that n. § f(x,). Hence the accuracy of solution of the auxiliary 
problem must be maximal when the version (3.10) is realized, and 
minimal when the version (3.11) is realized. This circumstance 
Say erneeee somewhat the computational time when distinct versions 
are used. 

The properties) described in Section,3:1 imply that a. solu- 


tion of the problem (1.6.1) can be found by defining the minimum 


ny, for which the condition 
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min M(x,n,) = 0 Gree) 
xeEn 


is satisfied. This reduction can be used for solving completely 
the problem (1.6-1), ore partially, for example, in those cases 
where for any reasons the value Nk > £(x,) has been obtained, 
and the values of Ny need to be decreased until the conditions 
(3.12) are satisfied. Practical realization of this approach 
causes considerable difficulties due to two circumstances: 

fiTSie Or 9 - He = t(x,) the problem (3.12) has as = 
rule, a continuum of solutions, and even for a Slight variation 
of the parameter n the solution of (3.12) frequently changes 
considerably, which makes the computations more complex; 

second, which is more important, in numerical computations 
it is often hard to verify that the condition (Gissle Dy aks Satigiied: 

The verification of equality to zero requires a high accuracy 


of minimization, which also complicates the computations. 


6. THE COMPATIBILITY WITH THE PENALTY FUNCTION METHOD 


Let the penalty function be additive and have the form 
e c 
S@)= Ble @P+ > i Gl. 


Next we compare the two auxiliary problem of unconstrained mini- 
Miz Levon MOL Oem Ct ONS eel Cx, Tt) ands M(x py espeC ii edmby othe 
formulas (1.3) and (3.1): We assume that the functions defining 
the problem (1.6.1) are everywhere differentiable. Then the neces- 


Sary conditions for the minimum in x are as follows: 
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Pele, Dah +2 |S et wet + 


+ hi, (x) Al (0| =0, (3.13) 
M(x, )=2(F(2)—n) fe @) + 
+2 By gi (x) g w+ % hi, (x) hf (0| =0. (3.14) 


If for the penalty coefficient we take 


alt 
Sed SISTER Re A aeee 


phescondi tions) (3.13) and) 3.4) scolncide.) Implement ins a, vers7on 
of the cost-function parametrization method, we obtain the se- 
quence int and we define the sequence of penalty coefficients 
ee using (3.15). In this case the sequences of sets of solu- 
tions of the auxiliary problems coincide for both methods. The 
parametrization method, hence, dattere essentially from the method 
of exterior penalty functions in the only fact that in the former 
method the policy of changing the auxiliary parameter n has been 
automatically defined, while in the method of the exterior penalty 
functions the user must define specifically the rule for variation 
of t. Hence it is hard to compare these methods: each implemen- 
tation of the cost-function parametrization method can be repeated 
via the method of exterior penalty functions, using the formula 
(3.15); and in a more special study of the problem in question one 
can choose a more advantageous policy of changing the coefficient 
tT, Which will lead to better results if the penalty function 
method is used. On the other hand, for an inappropriate choice of 
the sequence {t,} the computational results obtained through the 


k 


penalty function method will be inferior to those obtained through 
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the cost-function parametrization method. These circumstances 
demonstrate how carefully one has’ to treat the so-called "numeri- 
Gal tests or invest igat ing the comparative effectiveness of var- 
LOWS Falvorit hms Tt desired: it is possible to construct convinc-— 
ing examples of the problems solved, showing either the advantage 
of the cost-function parametrization method compared with the 
method of exterior penalty functions, or the examples "proving" 
the converse. 

In conclusion, we make several general remarks. The cost- 
function parametrization method is more convenient than the method 
of exterior penalty functions because in the former, first, the 
rule of defining the subsequence {n,J is more concrete and, 
second, the auxiliary function M is bounded from below by Zero, 
hence this removes one of the drawbacks of the penalty function 
method associated with the possibility that the auxiliary func- 
tion is sometimes unbounded from below on an unfeasible set. At 
the same time, the parametrization method has two disadvantages: 
to implement this method, one needs to know the lower estimate of 
the value f(x,); the auxiliary problem of unconstrained minimiza- 
tion has to be usually solved more accurately than in the exterior 
penalty function method. In general, these methods are very simi- 
lar, they are best Suitable for preliminaty, coarse calculations; 


these methods hardly yield a high accuracy of solution. 


7. GEOMETRIC INTERPRETATION 


For convenience of representation, we express the function M(x,n) 


as follows: 
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M(x,n) = (£¢x)-n)? + R27 Gx) , R(x) = VSCx) 


In Figure 1 we diagram the system of co-ordinates in which R is 


measured along the abscissa &nd f along the ordinate. 





R 







Figure 1 


If for all possible values of x we consider the set of points 
with the coordinates [R(x), f(x)], we see that they form some 
set, W.. Cin Figure 1, the shaded area)... The point, , B.. having the 
co-ordinates [0, f(x,)] corresponds to a solution of the pro- 


blem G).O.1)., On the plane JJRof i). thesequat ion 
z ; 
(f - ny) caer = const (sh ae} )) 


is a circle centered at the point Ao» with the co-ordinates 
[O,n,]- A solution of the auxiliary problem of minimizing 

M(x, ny) in x ¢ E" ean be inverpreted, on the Lane.) cin. ise) eS 
finding the circle (3.16) of the minimum radius, having the common 


pointy Co) wath Chesser. Wi esslhe: pointaiGe at husmobitadmedesacmdaine 
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co-ordinates [R, (x1), f(x, )]. In Figure 1 we also show: the 


point A, with the co-ordinates 


LO tM xen) 


, 


? 


the point Ao with the co-ordinates 


MC x). 7,) 


Be a ee 
OVER Con 


the point A, with the co-ordinates [0 , f(x,)]. 
The point Ay obtains from the condition for intersection 
of a tangent to the circle with the ordinate: at the point C. 


In the first version of the method, the ordinate of the point A, 


is taken as Net 1 in the second version this is the ordinate of 
the point Ay; and in the third version, the ordinate of the 
point Ag. In each version, upon finding Need it is projected 


once more onto the set W and so on, and So on, until the value 


6th becomes sufficiently close to PKR, 


k 


4, THE INTERIOR PENALTY FUNCTION METHOD 


(  ETGENERALSIDEASOE LHE MERHOD 


We consider the particular case of the problem (1.6.1) where the 
feasible set is defined only through constraints of the inequality 


type. Suppose we are seeking 


min ft); X={xEE*": h(x) <0}. Cn 


As before, we denote by. X, the set of solutions of this 
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problem. The set of interior points is X o= {x eR’: h(x uc Oss 
Ts K\Xp is the boundary of the feasible set. 


We introduce next the auxiliary function 
e 
PCS SE) = MCE) EU eT Ce pC) 


The continuous functions u(t), T(t) are defined for any 


i 2 OMe and such that. tor) 0a Os the secondatrvoens 


TW tee) 4 Th 
bO> 0 2) 20 Taper: Hes (4.2) 





Everywhere on Xo the famerion pC) as contin— 


for any infinite sequence of the points {x} 


are satisfied. 
wouSs and ss biGs))5=) 10), 


belonging to Xo EWaGh (CONanKenofgaiayes Gu) fe) joresijoe Ose i inlae\ ilaljanaite 


lim b(x5)- = +o , (4.3) 


ji 0 
As very simple examples of the functions b(x) satisfying 
the above condition we can give 
c c 
1 1 
Da —hJ (x)? d (h7 (x)? * 


j=l j=l 


The numerical method for solving the problem (4.1) is the 
following. For an arbitrary increasing sequence Ost, <ty 0 06 
which tends to infinity, the auxiliary problem of minimizing 
P(x,t,) is being solved. As a result, one obtains the sequence 


Of points {x,} satisfying the condition 


as Arg min P(x,t,) ‘ (4.4) 
xen 


Under certain conditions, each limit point of a converging 
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i.e., we have 


? 


subsequence of the sequence {x J belongs to i,, 
the convergence of the method described. 

The condition (4.3) yields that as the point x is approach- 
ing the boundary T, fhe etaee P(x,t) tend to infinity. Hence 
the minimization of P(x,t,) in x yields the points xy, belong- 
ing to the set X. The auxiliary problems (4.4) can be solved us- 
ing numerical methods of local minimization of functions in many 
variables, starting the search from an arbitrary point Xo € Xo: 
The function b(x) is thus a peculiar "barrier," which justifies 


the name the 'method of barriers' or ‘interior penalty function 


method.' 


2. PROOF OF CONVERGENCE 


For an arbitrary point Xo Xo we define two sets: 


Q(t)={xE En: P(x, th <P (x, t), rE Xo}, 
Bitte C E12) £ A) sor (tgs fe Go - 


LEON omelet mab em atin Gil One (Ox) Diem CON tn OU sm © 1mm let 


the set of solutions X of the problem (4.1) be non-empty, and 


* 


let for any ¢« > O there exist a e-neighborhood Gtx) of the 


set. X such that N = G(X,) n X 7 @; let the conditions (4.2) 


*? 


and (4.3) be satisfied, and let the set B(O) be compact. Then 
the method of interior penalty functions converges to X, on Xo: 
Prooty alt is Seen hat. thes nelusions —OCt ) ec EC tye ce BCOe 


Xen eB Cleo) neue Ctr) hold for atye. Ove O. ei hererore,. thessetyt x. 


* 


Ande tore aliemt 2 Oe the Sets) (tyr Bt) "are bounded, and the 
auxiliary problems of minimizing (4.4) are solvable. For any 


xe Sees xe X we have I(x, ) = f(x), all the more 


* 
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f (xs) <P) +S be). (4.5) 


All the elements of the sequence {x, } belong to the set 


B(O). Hence the sequence tx, 3 is bounded and has a subsequence 
es 
xp converging to the point x « X. It follows from the inequal- 


ey (C456 )) wae 


= : (R 
P(t) <F@)-+d, d= lim Eb (x9). 


Lett Gee ¢ pt, daw Then Sd=s0" x sete since for 
d > 0..we.would have f(x.) > Cx e x € X, which is impossible. 
We show that 


ix) eer Kat od (4.6) 


does not hold. Assume the opposite: (4.6) holds. By hypothesis 
of the Theorem there exists a neighborhood G. (CX) such that 

N #9. Let € be so small that x ¢ N. We can always find the 
point sy = N for ® whieh 


EGS ae cae a (Woe eee) C47) 


From the condition for defining the points x iol hows tila.t 


k 


T ( 


a) t (tr) 
uw (tp) b (X,) <=/ (y)+ a (Ex) (y). 








f (t,) + 


Taking the values of t, corresponding to the subsequence {x } 


and letting k + ©, we obtain f(x) +d < f(y), which contradicts 
(4.6)7° (4. 7)s. | Hence (456). does noteholds Each limit point of the. 


xo /// 


sequence {x,} belongs to X 
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3. COMPUTATIONAL ASPECTS 


In this method the sequence of interior points of {x,,} is con- 


structed from a feasible set. This can be quite useful in those 
problems where it is not desirable to tonsider, for any reason, 
unfeasible points. For example, some of the functions defining 
the problem (4.1) do not need to be given on an unfeasible set. 
At the same time, this method is not applicable to the problem 
(1.6.1) when the functions g(x) are nonlinear and the interior 
of the feasible set is empty. In computer implementation the 
auxiliary functions are usually combined so that equality-type 
constraints are accounted for through exterior penalty functions, 
and inequality-type constraints through interior penalty functions. 
For the problem (1.6.1) it is possible, for example, to use the 4 
function 

e 


Pe. D=fO+7 De Ol +!D Say 


i= 


where t > O during the computational procedure. A detailed 
study of such combined penalty functions can be found in 
AV LEaACCOmandmG see MeCorm:l Ckis| 44 7,u0an Gwen Hi POA ce [els |i 

The Czechoslovakian mathematician M. Hamala shows in [1] that 
in implementing the interior penalty function method one can get 
ridwOm TLhescCondl Ons tort herunc ton —DC@amLO, Lend sto Antinity ac 
cil Anderequare mE inNSveAd uid t ae snOrMn Ol une crad lent Ot sb ire 
a Cots Mn (xo) MeN Omh Om nit Nt yas xn nl inet hate Ccasensasman 


auxiliary function one can, for example, use the function: 
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P(x, =f (x) +t V—Al (x). 


7 


In Fiaccko and McCormick [1], and in Yu. G. Evtushenko [9], var- 
ious "continuous" methods siflilar to (1.17) involving barrier func- 
ions have been suggested. 

In the numerical implementation of the method, approximate 
solutions of the auxiliary problem (4.4) are found, with simpli- 
fied versions analogous to those of the exterior penalty function 


method. 


4, ESTIMATION OF ACCURACY 


We consider the auxiliary function 


P (x, 8) =f (x)—e ay In(— Af (x)). 


We use the method of asymptotic expansions described in Section 
3.2. For a small parameter we take €, as an approximate solu- 
tion to the problem (4.4) we take any point x satisfying the 
condition 


IP xie sy aye se 


For a particular point x = x(€) we have 


f, (x (€)) +h, (x (e)) 0 (€) = ea, 
D (u(e)) A(x (e)) =— e/. 


n 


Here. ae BE, llal| ei, .y « Eo 


ee erre 


denoting the vector all 
components of which are equal to unity. 
We seek a solution to the system (4.8) as a series similar 


to (2.11). We substitute them into (4.8) and expand the functions 
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: F ; O 
as a power Serwes Of €: | equating the expressions for € and 


those for et we obtain 


he (X-) +h,.(%-) Us — 0, vlhi (<4) => 0, 
Lig (Xe) Ue) H1 bh, (%e) 01 = 4, (4.9) 
D (0,) Af (Xe) X; + D (h (x4) 0: = — I. 

Assuming that in the problem (4.1) the conditions of Lemma 
4.1.2 are satisfied, we solve the system (4.9) and obtain the 
following asymptotic estimates: 

& (8) = Xe + eLizz (Xe, Ue) (A—A, (Xe) 01) +O (7), 
v (2) =v, + ev, + O (87), 


Uy = —[D (h (x4) —D (04) AT (4) Lid (Xe) Ve) hy (417? X 
[e+ D (Us) AE (Xe) Lek (Xe) Ve) @]. 
Dropping values of the order eo. we Obtain that 11-5 jf <.otx,), 
; 7 


then hd (x,) = 0, 
uh <Af (xe), x>=—], h/ (x_-+ 8X) = = <a); 
Use 


tie of (Xx, we bave 


M(x) <0, = 0, ol=— > 





I 


These formulas yield 


f (*.+ex,) =f (%e) +e, 





uh 
Faas e) =f (x)+e [N+ a ce SS In (%) | 


EO (X~) JE O (Xe) 


Here N denotes the number of active constraints at the point x, 


(i.e., the number of those h(x, POL watchs “j= Ox)". 


TRhenvalUuchOtmtheropjce: iver suncllonetCaCe))) 1s) theretore 
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greater than that.of, f(x,) and tends to this value as. ¢€ +.0. 

It is worth noting that here, as well as for exterior penalties, 
the values f, h, P do not depend on the particular choice of the 
VieChOr laws CUD LO Dee stile terms). 


Many interesting estimates follow easily from the Young- 


Minkowski inequalities given in Subsection 3.2.2. Letting 
Ta = -h(x), Sy, ==, Bor ah Ta, 


we obtain 


eC — c . -1 
f(x,) + 277 t Wi £* f( sda g oe ul ho Gd 
deed. 


i=1 
siete 
a = -h(x)t, TD = ey) a = $, B=-1 , 
then 
Copeincs Sesfiaia = 
as a) iV! < fk) - 2V¥t ) N-hi (x) 
i=1 i=l 
af 
ta, = = bat my oae Zbt = tv, a = -1, B= 4% 
then 
Cis eS i i= 
A(x et geet 7) Teles (RE Ty bel 
i=1 Vz i=1 


It is possible to introduce the scaling vector Z in all 
other formulas as well, but we shall not do it because this vec-— 
tor makes the formulas very complex and carries no additional 
information. We can always assume that h is a scaled vector. 
Therefore instead of the vector Ze«R° we use the Sada cama 


everywhere. 
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For exterior penalties, the’use of the Holder inequality 
led us to exact penalties. We give a similar result for interior 


penalties. Let 
a= Cx). b= "v5 Q = =1, B=% 


men; fortanyu.0 erties ||v,/by! x =k, “wevhave 
2 


Cc i eae -1 
x a aici aa [| J [-h°(x)] 
i=l 
FOrs 0. Sus 6 el Land. anys 0 =. = Sole lege x 4%. we 
have 


Cc ——} 2 
ECx peed | ) V-ni Go| 
i=1 


Taking this approach, we can obtain many other estimates. 
5. EXTRAPOLATION 


The purpose of extrapolation in the implementation of the penalty 
function method was elucidated in Subsection 3.2.4. Thus we 
discuss it here only briefly... For a coefficient €, Suppose 


that the vectors x Vv have been found, which satisfy the 


ke ks 


conditions 


L, (Xp; Ux) a ke (X,) + h, (X») u,= & Ap, 
D (v,) h (x4) + 2! =0, 


where Nye lles ee 
Next, we introduce a small parameter uU and consider the 


system 
i (x) +h, (x)u= (&—p) ap, 
D (v) h(x) =— (€g—p) J. (4.10) 
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We seek a solution to (4.10) as a series analogous to (2.23). 
Making standard computations, we find the following asymptotic 


estimates of the solutions of the system (4.10); 
e 


% (WL) = x,—pL zh (Xp, Up) (Ag+, (Xp) 01) + O (yr), 
U(L) =U, -F BU, +0 (u*), 
Uy =[D (h (x4))—D (04) AE (xp) Lid (Xp, Up) hy (xp) ]7? X 
X (Le+ D (04) AE (Xp) Lit (es Up) Ap). 


If the computational procedure following the interior penalty 
function method terminates upon finding Xp then, by the formu- 
las obtained, as the improved values of x and v_ we need to 


take = 
Xp41 =Xp—Lig (Xp» Up) [L.. (Xp, Op) FA, (Xp) v], (4.11) 


Ups1 = Up, 
0 =[D (A (%_))—D (0g) hE (X4) Lick (Xp Up) Ay (Xp) 172 X 


XD (Up) [AE (X~) Lit (Xps Up) Ly (Xp» Ug) —D (A (x;))]. (4,12) 


If we consider these formulas as some iterative method, we easily 
show that this method coincides with Newton's method applied to 
the solution of the system L.(x,v) =O DG hex) ="0" 

We can now use the auxiliary function 


ce 


1 
P (x, a) (2: eta aia (4.13) 


j=l 





Then the following system plays the role of (4.10): 


fa(2)-+ 3 (PHL) =(e4—B) 


D (w)h (x) =—(e,—p) I, wes. 


It is not hard at all to obtain all the necessary computational 
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formulas; an analog) ot the method (4,11) (4.12)) an this case is 
given by the method (4.1.11), which will be obtained and studied 
in our next chapter, using, however, different considerations. 

In the same way as in Section 3.2, we can Simplify the extra- 
polation, using the formula (2.26). For the auxiliary function 
(4.13), the employment of the formula (2.26) will correspond to 
the linear dependence of x, v on the square root of the true 


penalty coefficient equal to ao ara Glee le) 


5. THE LINEARIZATION METHOD 


Currently, there are many versions of the linearization method; 
most of them can be interpreted as the implementations of the 
exterior penaved function method using non-differentiable eee 
ties. This is the reason why we have described the linearization 
method in the chapter dealing with the exterior penalty functions. 
We shall give later several versions of the method, omitting, how- 
ever, their justification. A detailed description of all the me- 
thods, including the proof of their convergence, can be found in 
the References. We shall consider this method for the problem 
(1.6.1), using the auxiliary function C1) althou ch wre very 


many works the problem (4.1) is considered, and either Pi(x,7) 


or P3(x,T) are auxiliary functions. 


1. THE GENERAL IDEA OF THE METHOD 


For many mathematicians working in mechanical engineering, the 
most appropriate approach to solving the prep)! en (1.6.1) .issto 


use the linearization method. This method implies the following. 


(256) 3. THE PENALTY FUNCTION METHOD 


The functions defining the problem (1.6.1) are linearized at the 


Ounce <a Xy3 the linear programming problem is considered: 


min <f¢(x,), x—x,>, 
xeX, 


(S54) 
Xp ={xX EE" g (x4) + Er (Xp) (x—X,) =0, 


A (X4) + AE (Xp) (xX —X~) < O}. 


This problem can have unbounded solutions, hence usually ad- 


ditional constraints are introduced: 


scare (weet dee il oe 
k 
Having solved the auxiliary problem, one finds the vector x = x 
and passes to the point 

ert > pk ok Ok Ale Ce eee case) 
This process is repeated at a new point Xpaq? and so on. Unfor- 
tunately, this method converges very slowly. It may seem that in 


order to obtain a high rate of convergence one needs the square 
approximation of all the functions defining the problem. It has 
been found that this is not quite true: a square rate of conver- 
gence can be obtained using linear approximations of the con- 
straints plus taking a quadratic function as a cost function: 
instead of the problem (5.1) the problem 


ae Ke (Xp), xx +5 (x—m)" Ne (x—x,)| (5 ans) 


xE 


needs to be solved. 


Numerous versions of the linearization method differ in the 
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rules for determining the matrices N, and the variables Oye We 


list here several versions. We denote by x the vectors 4 x 


found by solving the problem (5.3), and denote by UL and ve 
the respective Lagrange multipliers. ,Also, we put ve 3 [u,v]; 
oes x, Vel, L(x, , Uz,» V,) = L(z,). We assume throughout that in 
the problem (1.6.1) there 6xistsa,saddle, point . 2... = [x,,U,9 V0) 


of the Lagrangian L(x,u,v). 


Zera VERSION DUE TO P.. WILSON 


Immo pandsiGon sopiet 


O7L - r i 
Goa We N,= (xp a 1, Up 1) a 


Under standard assumptions, Wilson [1] shows that this process 

: - 
Nasal o@anl quadratic rate of convergence. To avoid the computa-— 
tion of matrices of the second derivatives, various simplified 


versions of the method have been devised. 


3. VERSIONS DUE TO U. GARCIA PALOMARES AND O. MANGASARIAN [1] 


These authors suggest the following methods for choosing a 


k 
and Nyt ' 
e1) ao, = 1, the matrix N, is such that IN, - L,.(4,) |] < o: 
@2) hy =) ey che matrix Ny. is such that 


a wi (Ne—Lecx (Ze) (Xe+1—%k) _ 
[Na Lees (2) I<, wee | Ze+1—Ze | ne 


When the problem (5.3) has a non-unique solution, instead of Zay 


a point of the set of solutions is taken at which the norm 


ead - Z,. || is minimum. The variable wv is expressed in terms 
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of derivatives of the functions defining the problem, at the point 


x It is proved that if conditions similar to those of Lemma 


ke 
4.1.1 are satisfied and’the problem (1.6.1) has only inequality 


e 
constraints, and furthermore, if the first rule is used, then the 


method has a linear rate of convergence; if the second rule is 


used, a superlinear rate of convergence. 


fe THETIRST VERSION DUE TOS. HAN. 


We assume that q =1. We define the matrix Ny more precisely 


during the iteration procedure by trying to approximate the matrix 


L (z,). To this end, we can exploit the various quasi-Newton 


ce 
formulas obtained in Section 2.5. We make use, for example, of 


Powell's symmetric version of Broyden's method (2.5.38). Let 


AX p= Xpt1—Xp 
AV, = Ly (Xpais Upsry Uns) —Le (X py Upsrs Uni) 


Then 


Ng Axp) Atk + Arp (AVg—Ng Axe)? _ 
<AX R, Ax z> 
_ <AVe— Ng AXp, AXR? 7 
[Cie Axe eR AXE: 


Nee Noe oe 


In a Similar way it is possible to apply many other formulas 
fOr COnVensi1onwoOl they matric Ny. In [1] Han proves the super- 


linear rate of convergence of this version, making standard assump- 
tions for the problem (4.1). 
HE ySECOND VERSION DUE MIO eS aia AN, 


All the versions described so far are only locally convergent. 


To expand the domain of convergence, one needs to introduce a 
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specaal\choice of the step 7a To do this, the non-differen- 


ke 
tiable auxiliary function (2.1) can be used. We define Oh. from 


the condition for approximate one-dimensional minimization of the 


Naat y eiunGw Ones Oy! , 
P(Xg41, tT) S min P(x, +O, (X_p—X,), T)- Vp. (554) 
ewe or is a sequence of nonnegative numbers such that 


co 
2 ee ee es OG Mare? positive constants; XK denotes, as be- 
=1 


fore, the solution to the problem (5.3), the matrices Ny. are 
updated by any quasi-Newton formula. The implementation of this 
version of the method is not much more complicated than the pre- 
ceeding version: it includes only the line search (5.4). At the 
same time, as Han has shown in [2], his version of the method 4 
ensures the global convergence if (4.1) is a convex programming 
problem, Slater's condition is satisfied, “t is sufficiently 
large, and the function P. is substituted for P. 

The computations described above can be regarded as the im- 
plementation of the exterior penalty function method, but instead 
of minimization of the non-differentiable Rte Ups Ce MTL © TL Gps) eo 
the quadratic. programming problems (5.3) are solved first and as 
the next step a one-dimensional minimization of Const SaGoner 
On the other hand, this version can be regarded as a special me- 
thod of unconstrained minimization of the non-differentiable func- 
enor (2,16) Wisk Ske 


A dual problem to the problem (5.3) (see the problem (1.6.7)) 


consists in finding 
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max max min [<P (x,), p> b> p™N gp + 


weE® ye Ee pe Gono» 
+ <u, B(Xg) +85 (Xe) P+, A (Xq) AE (%,) PD], 
where p =x - Xp TE eve men aytenostiex. N. US NOw San ela ae bine meine 
rior problem can easily be solved: 
af Nethle ead v) (5.6) 
Dp k x he? d ., ° 


Substituting this expression into (5.5), we arrive at the exterior 


problem of finding 


max max [L (tp , 0) —f (%,)—yI Ne Le (Hm 4, ©) EE) 


ueEe veE4 3/2 


Let [u, sv] bes the solic onwote then problem Co. 7.) gee ne nemby, 
(5.6) we have 
= Sal be 


ee ee ee 


Therefore, the quadratic programming problem (5.3) can be replaced 
by that of solving the maximization problem (5.7) which has only 


simple constraints. 


G. A VERSION DUE TO BN. PSHENICHN Ys 


By hypothesis, N Peo Lhe Sibien is defined as follows. 


ceo a Oy 


One finds the first value s = 0,1,... for which the inequality 
ie 254 = 
P (apa ete) te) SP (ns HY — Bee <0 <I 


Lis satisfied. If this inequality is satisfied for s = Sy Oe 


-s 
ieCpihi Strode Lee Sincs sumed nm COM) mr bilalt = me es simpli- 
fy the auxiliary problem (5.3), only those constraints in which 


the feasibility has been violated maximally are taken into account. 
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To do this, one finds the quantity S*(x,) (see the formula 


(122)) 5") and the teasible set x, Li athemp mobiles (Ol obese ey en 


by ‘ ale 
g! (x,) + <gi (Xp), Xx —X,> va 0, iE V, (Xz), 
A (x,) + <hh (x,), x—x,> <0, jE W. (x), (5.8) 


where the index sets are used: 


Wi(%,)={iE[lse]: |g! (x,) | > S* (x,)— 5}, 
Wax.) ={7E [lic]: A (x4) > S*(x,) —8, }. 


The initial value So > 0 is specified by the user; next, it 


is halved each time when the auxiliary problem has no solution. 


The rule for the coefficient T) is as follows: if the sum of 


the moduli of the components of the dual vector, corresponding to 


f 


Therconstradnts iG) <cpracis slesss than eat the value Ty is 


ke 
doubled; otherwise Tq Ses 

The convergence of the method is proved under the assumption 
that P(%,T) is an auxiliary function and the gradients of the 


functions defining the problem satisfy a Lipschitz condition. 


The proof can be found in B. N. Pshenichnyj and Yu. M. Danilin [1]. 


7. VERSIONS DUE TO A.!. GOLIKOV AND V.G..ZHADAN [ 2] 


These authors introduce the function M(x) which is continuous 
on E” and satisfies the condition M(x) >1 + Yn ||f -() [lo etn 
the first version, the following linear programming problem is 


solved: oteelnt CL 


min min Kt), x-x,) + M(x,) aq] ; Cag) 
x q 


where q6é Be the conditions (5.8) hold and, furthermore, 
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Pea Foeae | erie cee eee nt ee (5.10) 


We denote by Ki ay, the solution to this problem, 
The new point*is defined by the formula similar to 


a = x, +a C5 pall) 


Eat k KP; 


The step Oy is chosen by reducing the initial step until 


the condition 
Ps (Xpui, tT) < Ps (Xp, t) + Pa, [<P (x4), Pe>—tS* (x,)] (CB), TA 


is satisfied. The parameter 0O< 8 < 1 and the initial step a 
are fixed. 

In the second version, the following linear programming pro- 
blem is solved: find 


min min min[<f,,(x,), a—b> + M (x,) q]. 
oma Me (5.13) 


beE, qe ae the feasible set is given by 


a! (X_) +<gh(x,),a—b>=0, i€ W,(x;,), 
hI (4) + <Ah (Xz), a—b><0, jew, (Xx), 


Sa +b) <1 44. 


ie 


(5.14) 


This problem is somewhat Simpler than (5.9) since only one 
constraint (5.14) is used instead of n conditions (m0) eche 
new condition for, 2. b to be nonnegative is easily accountable. 
The point X44 is defined from (5.11), where Pia, cu Ont e 
step being found trom (5.12), 


It is easy to verify that the dual problem to (5.9) is equi- 
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valent (in the sense of equivalence of optimal values of the dual 
variables corresponding to linearized constraints) to the follow- 


ing problem: 


imax | elk GE Kae ee OIE Le 1) 
je Ws ( 











u,veU, te W, (x,) 2 Xp) 
—IWfe(%)+ Dd  ulgi(x,) + DY vihl (x,) ; (5.15) 
te W, (x,) fe Wy (xp) 1 
where 
Uy= fu, > 0: 1€ Wr (x), | Wo (%), 
[fe Ga) + y ugi(x,+ DY vihl(x,)| <M (x)\ : 
tewW, (x,) jew, (x,) 1 








The dual problem of (5.13) is the same as the last problem, 














with the only difference that instead of the norm the 


1 


TSenecded. we her cost tunct Tons tne Coe and 














Chebyshev norm meh 
(5.15) are close to each other and differ only in the norms of the 
gradient L,(x,u,v). We will be discussing the methods of this 

kind in Section 4.6, where we use modified Lagrangians. The con- 
vergence of both versions can be proved under assumptions close to 


those used in Section 3.4. lLican pe tound= an A. Ll. (Gollkov. and 


ViG. Zhadan [2]. 


Chapter 4 


NUMERICAL METHODS FOR SOLVING 
NONLINEAR PROGRAMMING PROBLEMS 
USING MODIFIED LAGRANGIANS 


The impediments in solving accurately many engineering problems 
using the penalty function method have impelled the development of 
numerical methods with geometric and quadratic convergence rates. 
The use of modified Lagrangians for solving nonlinear programming 
problems was suggested for the first time in Arrow, Hurwicz, and 
Uzawa [1]. Later on, this idea expanded and developed into a 
method with a geometric convergence rate. The first published 
works in this direction are Hestenes [1], Powell [1], Haarhoff 
and Buys [1], followed by Rockafellar [2], [3], Tret*yakov [1], 
Polyak and Tret'tyakov ty, Gol“shtein and Tretyakov [1], 
Bertsekas [1] - [38], Kort and Bertsekas [1], and many other authors. 
This method is frequently regarded as an independent method for 
solving nonlinear programming problems; Polyak and Tret'yakov call 
it "penalty-estimate method"; Rockafellar and Bertsekas call it 
the "method of multipliers," and other authors, the "augmented 
Lagrangian method." 

We shall discuss this method in Section 4.3 (method 3.2), 
taking a still different approach, that is the reduction of the 


initial nonlinear programming problem (1.6.1) to that of finding 
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a local maximin, minimax, saddle. points and solving systems of 
nonlinear equations, using modified Lagrangians. This problem is 
then solved via the well-known methods for problems of this class. 
Under this approach, the method (3.21¥ is no more than simple 
iteration applied to solving a system of nonlinear equations. 

This reduction allows us to obtain a variety of methods for solv- 


ing nonlinear programming problems. 


1. THE SIMPLEST MODIFICATION OF THE LAGRANGIAN 


1, PRELIMINARY RESULTS 


Throughout this chapter we consider the general problem of non- 
linear programming (1.6.1). By Theorem 1.6.1, under certain coe 
ditions, solving the problem (1.6.1) can be replaced by finding 
saddle points of the Lagrangian. The numerical methods of finding 
the saddle points are inappropriate for this purpose, because they 
are intended for finding unconstrained solutions, whereas in 


Cin 6.9) swe Havel the constrain ive 0 Fto, account tor. | To over= 


come this difficulty, we modify the Lagrangian by introducing 


F (x, u, w)=f (x) +<g(x), +3 (w/)? h/ (x), 
Wei Web 


Ci) 


Let 
y=[u4, wlEE”, m==e+c, Za ylebst* 
F (x, u, w) =F (x, y) =F (2). 
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Let the functions defining the problem be differentiable, 
and in the problem (1.6.1) let the Kuhn-Tucker point Dx y Uy: Ve! 
exist. Then we define ie (Seana where wi? = Vd ch O'te 
j < [1:c]. From the Kuhn-Tucker conditions we see that z, is 


B SEV kOe JXOAND Osi ICzays  Sauaves 


FuKey CoO ea (1.2) 


This property allows us to reduce the problem (1.6.1) to finding 
solutions of (1.2). Repeating the proof of Theorem 1.6.1 almost 
verbatim we can show that if 2, = [x,,u,,w,] is an unconstrained 
saddle point of F, then x, = xX, and wihI (x, ) = 0. If the 
functions defining the problem are differentiable, then the saddle 
point 2, of F ‘satisfies (1.2). However, the use of numerical 
methods of finding saddle points is complicated by the fact that 

F is not convex/concave, and this condition is really essential 
for ensuring convergence of many methods for finding saddle points. 
Hence it is appropriate to consider the problem of finding the 


local maximin 


max max minF (x, u, w). Cia) 
uceE® weEC xe En 


One can solve it using the methods in Section 2.6. We show that 


under standard assumptions, the matrix Fo 2x? is nonsingular 


at the Kuhn-Tucker points and F has a local maximin. For this 


we introduce three auxiliary square matrices of orders nea 


n+m and m, respectively: 


[ Fyx (2) } Bx (x) | 2hx (x) D (wv) 


Foe) ge (x) 0 : 


L2D(w) AZ (x) | Oce | 2D(h(x)) J 





Cou: Vee ; 
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ee cee ag es) Die) 
H(2)= Ct 0a eee | 
QD (w)AT(x)' Ose | 2D(A()) 
N (2) = [Feu (2) | Pew (2)]? Fo (2) [Pea (2) | Few (2) — 
fh, (2) ne]. 


(1.4) 


dae en een cenns lemmeweeneeerns 


oy F wu (2) | Fw (2) 





LEMMA 4.1.1. Let the sufficient conditions for a minimum given 
by Theorem 1.7.4 be satisfied at the Kuhn-Tucker point 
[x,,U,,V,] of the problem (1.6.1) and let the constraint quali- 
fication hold at x,- Then the matrix Foz s 2x)? where 

Z, — [X,,U,,W,)], is nonsingular. 


To show it is nonsingular, we have only to show that its 


, 
null space is Zero: 
F,,(2)z=0, , z=[x, u, w). (1,5) 
Written out in detail this system is 
F px (Ze) ¥ +x (Xe) U+ 2A, (4) D (w.) w=, (1.6) 
QF (Xe) X=0, D (We) AE (Xe) x+D (h(x) w= 0. Cliz7) 


EeOMm Gia ahibeet OL VOW SabhaitmeLOmeaclel 
eek) hI (x,) = x hI (x,) aOCe Wott LO. Bora ae o(x,) we 
have hd (x,) <40; woes wy, = 0. In both cases 
hi (x,)w/=0, D(w,) AT (x) x =0. 


e aT 
Assume ||x|| # 0. Then multiplying (1.6) on the left by x and 


noting (1.7), we obtain 


a7) 2199 ce 
x Fees =— OF =. (ala i8> 


(268) 4 NUMERICAL METHODS WITH MODIFIED LAGRANGIANS 


where x satisfies the conditions 
& (X~) x =0, [ht (x) = 0, jEo(x,). 


6 
To each Kuhn-Tucker point Docs lV yl there corresponds a 


point 2, = [x,) uppwy] i Hence Pee) = Lee Sa Ue, Va) and from 


(1.8) we have 


aL (2a, Ugh 0) oe 0, 


But this equality contradicts the positive definiteness of the 
matrix Lg (Sas Uys Ve) on Kn (x,y) (see the condition (1.7.16)). 
Hence x = 0. From (1.6) and (1.7) it then foldows' that 

g(x ut2 DD) ndaow WH = 0, napa = 0 

J EG(x,) 

By the strict complementarity condition, all vd # O for 
} Se Sx), for these same values wd, %# O. Noting the constraint 
qualification, we conclude that u=0 and w=0 if 0 Snax). 
It was shown above that w [Ola toryaliLays ies o€xi)r Hence all the 
the) vectors’ x,/u,/w. satisfying (1.5) are’ null and Poe 7a) os 
nonsingular. /// 
LEMMA 4.1.2. Let the constraint qualification and strict comple- 
mentarity condition be satisfied at the Kuhn-Tucker point 
fe ues hs Also, let the functions defining the problem be 
twice continuously differentiable in a neighborhood of X,, and 
the matrix Ly Ses Ue Vig) be positive definite. Then: 


el. the matrix N(z,) is positive definite; 


e2. all roots A of the equation 


JnC2 guetta ke (1.9) 
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have strictly negative real parts; 

e3. the point [x,,y,], where Yx = [uy Wel, isa point 
of the strict local maximin Of the: problem €1.3); 

4. the point» ([x,,y,]° is*a-saddle point (local in‘ «*' 
global in y) of the problem (1.3); 

o1, x, is a local: solution of ‘the problem (1.6.1). 
Proof. We can assume without loss of generality that hd (x,) = 0 
owe Ab es af Ss eae hd (x,) <0 f0ry lehisias in eic.” (introduce 
the vectors 
; k=et+s 
and 


[ st+1 


i eCR ems ahs, eee her 


2, 
We can write the matrix N(Z,) defined by the formula (1.4) as 


follows: 


Fox (2,) Pex (Z¢) F xq (2,4) Ones) 


ven=| ae HOSIASET igh ABS Ses |. 


Oe srk i—2D (h? (x,4)) 


By the strict complementarity condition all w) a One © 1 
eS ele nce mt FOmMmune constraint qualification it follows 


tiatethescolumms or © F-Cz. yy pare linearly independent, the rank 


xq 
of this matrix is maximal, equal to k, and the matrix 
-1 Q : 2 a Bares fie%, 
Fax 20 F Gad Fig 2H) of dimension k is positive definite. 


The diagonal matrix -D(h?(x,)) is also positive definite by 
assumption, “This implies that’ N(z,) is’ positive definite. 
Let H(z,)Z = Z and let the vector 2 = |[X,i,¥V] ‘be the 


complex conjugate of Zz. We assume that |z| # 0. Then 
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Re 27H (z,)z=Red|z|?= 
= Re[—xF,,. (24) ¥ + 207 D (h (x.))@] <0. 


The last inequality follows from the positive definiteness 
e 
of Fx (2x) and the fact that h(x,) < 0 at a feasible point. 


Let Re A = QO. Then 


Re[—x'F ,. (24) * +2w™D (h(x,)) @] =0 


only if) x = O and w? = 0 for j such that hI (x,) 2 0), iWwieenn 


the system H(z,)zZ = 2Z we obtain 

(4 no ce AS 

D gi (xe) ui +2 >) wlw/hl (x.) =0, 
which may be rewritten as 


gi (xe) ul +2 S wiw/hi.(x,)=0. 


J€O(%-) 


iM- 


t 


By the “strict complementarity condition, if j «© o(x,), them all 
w 7 O. From the constraint qualification we obtain that the 

i Sj : : 
vectors 8 (Xy) and hy (x4), where’ "j <«°o(x,), © are linearly 
independent. To prove that the above equation holds, it 1S neces— 


sary that tu =0 and w) =0 for all j « o(x,). But we showed 


above that w) = 0 for all j ¢ o(x,). Hence u=wWwe=O0 and 


z= 0, which contradicts the initial assumption |z| # 0. Thus 
thewcase se Re = 0 e does notehold, and sal l= themroot~s son ale, 9)) 
have negative real parts. 

At a stationary point 2, ="[x,.y,!] ‘the matrices Pee aud 


N are positive definite, therefore by Theorem 1.5.9, Zy, 18 a 


Suricu local maximin= pointe or the proplem Glas). 


It is easy to show that jz, is a saddle point (local dm Gx, 
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global in y), i.e., there is a neighborhood GCap ay od amy \ysueh 
iba te tOrean yee COx ae x4. Tan any ly ¢ Ev the inequal- 
ities 

Pax) = Cx Vee. + Dt Ra) GLO) 
are satisfied. 

The function F(x,y) is not convex/concave (convex in x 
for any fixed “y and Concave in’ y for any fixed “X). Concavity 
ID Wen OCCURS Onl yartors tlsce dam x<amwath lh Ga) < nO. 

From the positive definiteness of Fc Ze) and the station- 
ary condition (1.2) we have the right-hand inequality in (1.10). 


NoGine st hats x is a feasible stationary point, we obtain 


F (Se) —F (qs Ys) = Dw! (80) a! + 3D (w/)* HI (x4) <0 ; 


EGP VB bh S ES we a4 yielding in turn the left-hand side of 
(Cab SEO) = 
For any x « XnG(x,), from the right-hand inequality of 


(1.10) we have 
i (~s) =F (2) F(x, ys) <f (Xx), 
i.e., x, is a local solutionvofsthe initial problem (1;6,1)). 
For convex programming problems the function BECKY 0 als 
econvexein x “andsyfirom, the condition Fe Cxio Tx.) = 0 it follows 
Chat ek Cx va) SCs, y.) fOr any x ¢ yo 
If the conditions of the Lemma are satisfied, then the suf- 
ficient conditions for a minimum given by McCormick's theorem 


1.7.2 hold automatically. Hence we also see that Xp seisea 


localtsolution tote the problem Gi. 6. 2): ¥ / 7] / 
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2. NEWTON'S METHOD 


For finding solutions of (1.2) we use Newton's method 
Zpaa = 2p%—- Fig (2p) Fe (Ze)- C1 dd) 


To implement the method, we either have to invert the symmetric 
matrix Pee of order n+m or to solve the system of nt+m 


linear equations 


F , (2p) (Zp41—Zs) =—Fz (24). Cia ) 


For problems of large dimensions one can use a block repre- 
sentation of the matrix of second derivatives 


ee (2) F xy (2) | 
F yx (2) 4 Fyy @)) 


We rewrite the system (1.12) in equivalent form: 


F 4 (Zp) Xp41—%n) +P xy (Ze) (Yeai— Ye) = —F x (Ze), 

FyA2,) (Xp —%p) + Fe (Zn) (Yesi— Ya) = aE (Z,)- 
Supposing that Fg 24) is nonsingular, we extract from the 
first equation the difference X44 7 X_ and substitute the re- 


sult into the second equation to obtain 


Yor = Ye tN" (2,) [Fy (24) — Fyn (Ze) Fax (2p) Fe (z,)], (1.13) 

Xpar =X u— Ft (2p) [Fc (Zn) + Fcy (Ze) (Yori — Ie) |- Ka 14) 
This process can be continued to obtain particular formulas for 
cCOmMpULINe Une VecCOrse su and w,. To this end, we rewrite 


k k 
(Cab, aksio) shia. qeieves akoagan 


N (24) (Yx+1 —Yx) = Fy (2) —Fryx (Zp) Fae (Zp) Fx (2x) (1.15) 
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Noting that the matrices of second derivatives ot and 


Ew are null, we put N in block form 


Here we have set: 
A= Fy Pit eu = Br PB xs 
B= Fy FF py = 287 FP gh,D (w), 
Ge Fe Eee ey ee AD) hia (w) —2D (h). 


Supposing A(Z,) is nonsingular, from the system (1.5) we 


have 
Wt oe tem AB] ert ah oer Giese») 
SOLE Gat ext xe eal 
Upg i =4U,+A7} [Fa—FagPe¢F .— B[W,41—@,]]. (1. 17) 
f 


Here all the matrices are computed at the point Za = [x,,U,,W,]- 
One can also obtain these formulas directly from (1.12) by using 
Frobenius' formula for inverting block matrices (see Appendix II). 


Various modifications of the method are possible. One can 


determine x 


k from solving the unconstrained minimization problem 


by taking : 
*,€Arg min F(x, Ys): (1.18) 


We assume that in some neighborhood of y, this procedure deter- 
Nines sa eunLguce dkitenentiablestunetion  x(y). (ne thic case we 


need to find a vector y being a solution of the system 


Using Newton's method for finding solutions to this equation, 


we obtain the formula 
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Yur =Yp+N-* (Xns Ya) Py (Xe Ye) C4549.) 


which follows also from (1.13) if we set there F(x = 0. 


K? V_? 


We can use the formula (1.14) to extrapolate x in solving 


ra 
the problem (1.18). 

To show the convergence of Newton's method (1.11) to a solu- 
tion of (1212), we use Theorem 2.553. (UsingoLemma 4.1.1, we 
express the nonsingularity condition of Fog t Ze? in terms used 
in nonlinear programming theory and arrive at the following state- 
ment. 

THEOREM 4.1.1. Let the conditions of Lemma 4.1.1 hold at the 
Kuhn-Tucker point [x,,u,,v,] and let Fog $2) satisfy a 
Lipschitz condition in’a neighborhood of oz, = [X,»U, Wy]. Then 


the method (1.11) converges locally to the point 2z with quadra- 


x* 
tel Gamat ele 

The method (1.11) thus ensures a quadratic rate of convergence 
in solving a general problem of nonlinear programming, which is 
unusually high for problems of this class. Later we will frequent- 
ly use Newton's method for these problems. However, the method 
(1.11) will be special, since it provides the simplest rapidly 
converging computing formulas. The method is widely used for 
solving various problems of small dimension. As a rule, this meth- 
od) is used to finish the computations; 1t 2s applied jatiter opher, 
coarser methods have been used to get the initial approximation 
fore the vector™ x “and the dual vecrors. 

Just aS was done in Section 2.5 in describing Newton's meth- 


od, we can list here several disadvantages of the method (1.11) 


which may complicate solving the problem (1.6.1): 
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el. in some cases, the method diverges; 

e2. the method is not practical for solving problems where 
the dimension n+m of the vector z is large because we then 
have to calculate and eens high-order matrix; 

°3. for an insufficient, coarse initial approximation the 
method converges to dispensable stationary points (for example, 
to unfeasible x); 

e4. If any components of the vector w are zero in the init-— 
ial iterations, they remain equal ot zero throughout all the iter- 
ations ("sticking" of the components of w occurs). 

Drawback 1 can be avoided in some cases by introducing a 
variable step; the rules for adjusting the step are described in 
DeCUlOM me. Ol. , 

Drawback 2 can be removed by using the formulas (1.13) and 
Gli) e OraGin ta) Cl Yo mends (1 t an OrelLe ls mama I 19) oT 
stead of (1.11). One can also apply various quasi-Newton methods. 

The last two drawbacks are characteristic of all methods 
based on the simplest modification of a Lagrangian of the form 


Ci) = Onercan “tryeto 2dde tol fh  penali zine terms. e. 2. by 


setting 
e€ c 
, \ 
P@=F@te| 3 te cortd me |. 
c= {= 
if 82, is a stationary feasible point of F(z), then it will be 


asta tvlonary polnveOLw Cz) = WeNcan therefore find stationary 
points of P(z). This technique somewhat expands the region of 
convergence; however it does not remove the possibility that for 
a fixed value of t the process may converge to an unfeasible 


point. 
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Co AR EIClALERV EG IRORS 


Another way of reducing the initial problem 1.6.1 to solving 
systems of equations is to introduce artificial variables making 
it possible to reduce (1.6.1) to a nonlinear programming problem 
in which the feasible set is given by equality-type constraints 
only. We used this technique in Subsection 1.7.4 and reduced 
the problem (1.6.1) to an equivalent problem (1.7.17) for which 


1 has the form (1.7.18) and the necessary condi- 


the Lagrangian L 
tions of the minimum are given by the relations (1.7.20), or in 
more detailed form, (1.7.21). Using the notation of Section 1.7 
we express Newton's method for finding stationary points of the 
function Done as follows: 

L3, (ps Yx) (Ze41— Zn) + Loy (Zs Ye) (Yeas —Yp) = 


mt — EE (Zh, Yule + (1.20) 
Liz (Zp Ye) (Zp41—2,) = —Li (Zp: Yr) 


The matrix Ley is defined Similarly by (1.7.22): 


&x (x) | hy (x) 


Ly ’ a | aa ve ’ ts 
Ve Dl, Log [Eee a 


Let 


Lizz (2, y)} Lay (2, | 


AZ; n=| ae ee erie ee 


Liz (Z; y)| Omm 


lies not hardeLos Show thateit thes cond: tionseot mliemmam4enl slate 
satisfied, the matrix A(z,,y,) is nonsingular. Therefore the 
method (1.20) is locally convergent under standard assumptions. 


The method (1.20) is harder to implement than (1.11) since 
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here we have to invert a matrix of order n+m+c. However, when 
using the method (1.20) we can assert that if the process conver- 
ges to some point Pe Pa acl s then the point x, is feasible 
Imechnesproblem @l56 41) 47. ethe method #1411) does snot have -this 
property. At the same time, we are not guaranteed here that the 


et} 


components of the dual vector y ke jae are nonnegative. 


? 


In later sections we will make more radical modifications of the 


Lagrangian enabling us to avoid this drawback. 


4, ITERATIVE METHODS 


If the conditions of Lemmas 4.1.1 and 4.1.2 are satisfied at the 
Kuhn-Tucker point [x,,u,,v,], the corresponding point 

[x,,uU,,W,] is a strict local maximin point of the function F(z), 
and all the methods described in Section 2.6 can be used to solve 
thes problens Gn Gl). Horecxample, ss une methodss@2.6..1)), (2.6.07). 
CZ OLELSD Ra CaO mano C2 ea op meal Deed ast Ons @ ihn Om Glo) maice 


written in the form: 


y=eF,, x= —F,, (1.21) 
Y=, x=—F eb ey ys (ies) 
y=F —F FoF, x=—F,, (1.23) 
y=F,(x(y), y),  ¥=*(Y)—*, (1.24) 
y= F, (¢(y), 9): (1.25) 
Here the dependence of .x(y) is defined from (1.18); e« isa 
small) parameter ='(0) <se <<) 1); when the arguments.of Fare not 


shown it is assumed that the derivatives of this function are 


computed, ate «[xCt),yGt)]. 
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Theorem 4.1.2. Let the conditions of Lemma 4.1.2 be satisfied at 
the Kuhn-Tucker point —[x,,U,,V,)- Then, there exists: < >_0, 
o> Oo Such that for any 1ixed — O;<"e.< €, 91/0 <)08< 0 wand aor 
€ = 1, the methods (1.21) -*(1.25) and their discrete variants 
of the form (2.6.20) converge locally to the point Za [X,sU, sy], 
fy = [Xp 5Uy Wy] - 

Compared with the theorems of Section 2.6, we have one more 
proof of the convergence of the system (1.21) for ce =1. Hence 
we need to prove this assertion only. Form a variational equation. 
ee bx=x(t)—x%, bu=u(t)—u,, Sw=w(t)—w,, 

6z=[6x, du, dw], 

where x(t), u(t), w(t) “are solutions of the .systém‘(1.21) satis= 


fying =theinitaal=conditdons= x (0) "= Xo, u(0) = u w(0) = 


0” Wo: 


Removing second-order terms in 6z, we obtain 
ézZ = H(zZ,) 62 


From the second assertion of Lemma 4.1.2 we have that the charac- 
teristic values of the matrix H(z,) have strictly negative real 
parts. This by Theorem 2.1.1 implies the asymptotic, exponential 
stability of z, and therefore the local convergence of the 
method (1521) and of 1ts discrete variant. 

If there are no inequality-type constraints in the problem 
(1.6.1), then for e¢ = 1 the method (1.21) is called the Arrow- 
Hurwicz method. It has a low rate of convergence and therefore 
is used only when other, rapidly converging methods are inappli- 
cable (for example, in solving large-dimension problems). Note 
that to take into account inequality-type constraints one can use, 


rather than (1.21), a method analogous to (2.4.5): 
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X pty =X%_p—Ap,Ly (Xp, Upy Up)s 
py =U, tO ply (Xp, Up Up) 


vk. =max [0, Ve bhOgh,/ (Xp ie v,)|, the 0, (fe | Tec}. 


Here the step Oy is taken either to “be constant or to vary 


according to the rules (2.4.6) - (2.4.8). 


2. MODIFIED LAGRANGIANS 


1. PRELIMINARY RESULTS 


The reduction to solving systems of nonlinear equations, already 
expressed in Theorem 1.6.4, furnishes a rich class of numerical 
methods for solving nonlinear programming problems. We will 
develop this approach, working with the following modified Lagran- 


gian: 


H(x, u, Y=FX)+D o(e'(*), w+ BVM) og 4) 


The functions $6 and yw must be such as to guarantee that 
(1-6.18) holds, which ic written in> the form 


c 


Y vey, << Vv &), ob), (2.2) 


J= 
where x « X. We will give below further conditions for the 


choice of o> and wv. 


For a fixed vector. [u,v] we define the set 


X (u, 0) = Arg min (x, u, UV). (2.3) 
xeEn 


A necessary condition of the minimum of H(x,u,v) in x is 


that 
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Hela he =f. (x) + 2 pe (et (x), u!) ge (x)+ (2.4) 


c 
+ © a(t! (x), /AL (2) =0 
J= 
holds. 
Let us compare this with the necessary conditions for a 


minimum of the Lagrangian in x: 
e c 
L(x, u, v) =f, (x) +2 u'gi, (+2 wht, (x) =0. 


It makes sense to introduce the system of equations 


On stl) — aw eL Cli te), (2.5) 
, (WM, w)=v0/, je[lic] (2.6) 


and to_choose @¢ and ww so that this system is solvable iff 

eo = 0, nd <0) hd yd = 0. If we can find xy Uy, Vy -for which 
the system (2.5), (2.6) is solvable, x, € ACW, Vy) and the con- 
ditions (2.2) hold, then by Theorem 1.6.4, the problem (1.6.1) 
will have a solution since Xx, € X,. 

Throughout below we will assume that the functions oe ey 
wns, v5) are continuous for any values of the arguments and at 
least continuously differentiable in gt and nd respectively. 
Let J(hJ) denote the set of real solutions to (2.6) depending 
on nd aS a parameter, and let the set of nonnegative elements 
in J(h?) be written z,(h9), Let us impose additional condi- 
ALOE Oy w gyal die 
@A,. For any real wd we have 

i 
el. PeOre Reo. 


°2. if g' #0, then d,(e,U") eu 
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eA,. For h’ > 0,  J(h’) = 9; for ho <0 the set J(h2) 
consists of zero; J(0) equals the set of all nonnegative num- 


bers. 

oat, DForigbey sym! dp(h) ope! Pyetbr\: be 4 Oa thelset J, (h?) 
consists of zero; any nonnegative number belongs to J,(O). 

@A,. lie “Bbie bes Ti ae iS woe the complementarity condition is 
Satieiied, n(x, )p= 0, .'0,< v4) then, for any pixie X\.one; has the 


inequality 


2 WORT), 0) <2 ( (%e), 02). 


eA... For any hd and any vJ > 0 the derivative 
Dh a Aae 0) 


2 
Condition A ensures the inequality (2.2). Condition As 


4 
will be useful for implementing the numerical methods. Let us 
give some functions satisfying A, 
(gi, ui) = glu +5(g)*, ot (gi, uw) = glu! +chg!, 
; : i 
gp? (gi, u’)=gi(ul—l)+e* > 
ot (gi, wi) = glu! + 5 [2g arctg gi—In[I +(e) J] e-™, 


p (g', u') = giui + ©" (1 4 ec0s uh), —al <—g <cel. 


All these functions are of the class that can be represented 


in the form 


(gi, u') = gi (ui—a' (0) B(u')) +(e") B (w'). 


Here a(g) is a differentiable function such that a'(0) # GCap) 
for any a #0, the function 8(a) is differentiable and 


8(a) # 0 for any. a. In even more general form, 
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gp" (g'; u') =g! 1, (0, u')) ar] (g', i); 


where for any a: the equality ng(0,u") = ee holds 
only for ge = 0. 


As simple examples for wW we give the following: 


PW, Y=s[W +0). 
pe(h/, v/) = (hh) + ver, 
we, o) = (hh) + 
v (1trAs- SE (pep EOE) (ans it WSO, 
(1—A/)-" Lf pihi <0, 
Wt (W!, v/) = 9 (hh) +0! [hi —y (AL), 


ede 1 [A it” jv 20; 
VW, wr [F(A both (o/)'] + 0! 


arctg hi’ it hi <0. 
Here re> Oj thestunetion a4) a1sesuch that. (C0) v=) 0m ay (aba 
UC y= (Ca) ee OO mae. Onna e Vale (Cc) es Oot hee) Ome CSelyin 

4 
y¥(a) = aoe oe Bo) 4 


The second and third functions can be written in the form 


pe (Wy, o/) = y (ht) + v/a (h/), 


where a(a) is twice continuously differentiable, 0 =< a*(a) < 1 
Hee ey se We al SS eh Gey ateke eh. SY Wy, 
; al 4 6 ; 
They Lune TvONS Wy — ail) eee ey) satisfy Ag, Ay, A. Let us 
check Ax for ous Let hd > OF then C276) hase thes form 
vere nos ve. dem ¥? Cat) = 0, which holds only if 


bye 0, but this contradicts the initial assumption. Therefore, 
for h? > 0 (2.6) cannot be satisfied for wv’. Let nd <—OR 


Then 


7 (hi) =" (AL) =0, yp’ (—AL) > 0, 
v/ oly! (—h1) =0/, 
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and therefore (2.6) is solvable only if vd =i Oe For nd = 0 
any value of v) satisfies (2.6).° Similarly we convince our- 
selves that Ag holds for: the remaining functions. 


If the strict complementarity condition is satisfied at the 


point [x,,v,], then condition Ay follows from the inequalities 


H(A (x), = (M(x) +o) < Fol)’, 

(hi (x), vt) Sole @ <p (hi (x), ol) =o, 

(W(x), vf) = of (Al (a) SW (re), oh) = Sol, 
wpe (fi (x), vf) = olf (x) — oly (— A(x) <p" (I (xe), of) =0. 


4 


? 


It is also obvious that Ae holds for the functions vow 


Condition Ay ts Vsatistied for oe and oo. 


2. NUMERICAL METHODS USING MODIFIED LAGRANGIANS 


For each fixed vector [u,v] we determine x(u,v) « X(u,v) from 
(2.3), next we choose [u,v] such that the following system of 


nonlinear equations is satisfied: 
Pg (g (x (u, v)), U) =U, Wp (A(x(u, v)), v) =v. Ga27) 
ene 


p(B, UH) =[Pg (Et, Us oor Oe (Bs 4), Pn (A, 0) =[p (44, 0), «22s PAY, 0°)]- 


THEOREM 4.2.1. Let the functions 6 and jw be constructed 


according to conditions Ay and Ay and let there exist vectors 


ety ere xu sue ute X(u,,V,) satisfying Cis) ar eLhen inthe 
problem (1.6.1): 


el. if A, holds then X, 7 P, Xy © Xx; 
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©) OR ued H( xy Uys Vy) is differentiable at x then 


KO? 
[xX,, Uy» Vy]. is a Kuhn-Tucker point. 
Proor, From Ay it, follows, that g(x.) = 0) «from Ay that 
BCX oa, wees Vee aNd vind (x,) = 0, j e« [1l:c]. Hence the 
conditions (1.6.17) hold and when A, Is satistiled) 1b, follows 
from: Theorem 1.6.4 that ‘x, ¢ X,. Noting that at [x,,u,,v,] the 
condition (2.4) holds, we conclude that [x,,u,,v,] is a Kuhn- 
INCE? yeronhie., fj) }/ 

Theorem 4.2.1 opens many possibilities for using numerical 
methods of solving systems of equations in order to solve the 


initial nonlinear programming problem (1.6.1). Using, say, the 


method of simple iteration, we obtain the following method: 


Un+i = Pg (E (Xp), Up), ee) 

Up+1 = Wp (A(X), U,), (2.9) 
where Xa = X(u,, Vy). If this process converges to some point 
[u,v] and x © X(u,v), then, when conditions A, and A, are 
satisfied the point x will be feasible, [x,u,v]\. will be a 
Kuhn-Tucker point, and when Ay is*satisfied “x « X,. If we 
require Ag to be satisfied instead of Ay; then there is no 


guarantee that v 20. To exclude this possibility, we introduce 
condition As by which from Vo 2 Ox, it» follows that. in the pro-— 


Cass Ont alechewealens (Aa, (Age) pull agave Vee 5. 10), 


Introduce the abbreviated notation: 


y=[u, vjE E*, D(x, ¥) =[9,(g (x), u), Va (A (x), v)], 
A (x, u, v) =H (x, y); ee (x) =[gx (x), h, (x)], 
A gy (XY) =[Bx (%) Pyu (G(X), U), My (x) Pay (A(X), 9)], 
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ee eee eee 
D, 6, y= [iia | 


Hae | , 


Wan (h(x), v) Ay (x) 
B(x, y)=®, (x, y)—O; (%, oy) Wee (%, ¥) Hey (%, 9); 


(25 10) 
Hy (x, y)=fe (X) +8 (2) Oy (g (*), UH) Ae (X) Pn (4 (%), 2), 
Lars Y) =f xx (x) + 
é c 
+ Dd whe () (Gt), w+ D bbe (2) a (HX), 0), 
t= J= 
gg (X,Y) =Lee (XY) +R, (x) OE (*, y). Ce 
The matrices ee Vinh are diagonal with the diagonal elements 
ap (gi, ut) d%p (hi, v') 
agidgt ’ onion % 
respectively. The matrices R_, E,., 9, oo B= pL ory 
Bs xy y x XX xX 
have dimension nxm, nxm, mxm, mxn, mxm, nxn, nxn. They 
are related as follows: 
Hgy (X,Y) = Rx (x) Dy (% Y). (2.12) 


One can write the system (2.7) and the method (2.8), (2.9) in 


concise form: 
Diels 
y= (x(y), y) ( ) 
Yori =O (x (Ye) Yr) (2,04 ) 
Applying Newton's method to solving the system (2Rd3)R wer arrive 
at the process 


Yr+1=Yr—[B (% (Ye) Yn)—L ml [D(X (Ys) Yr) —Ynl- (2.15) 


We do not need to introduce. the auxiliary problem (253)5 instead 


(286) 4 NUMERICAL METHODS WITH MODIFIED LAGRANGIANS 


we apply Newton's method directly to solving the initial system 
of nonlinear equations (2.4) - (2.6). We then obtain the method 
FD ge Xp» Yn) (Xp+1— Xn) ET xy (Xp Yn) (Ya+1—Yr) = 
=— HH, (Xp, Yx)s 


DE (Xes Yr) (X,41—%,) +[®, (Xn» Yr)— Tn] (Yrti— Ya) = 
= —D(Xg, Ye) +Yn- 


Using Frobenius' formula, we find next that 


Yer1=Y,—(B—I1,) 1 [O—y, —OL HBA, ], (2.16) 
Xp =X,— NA, — AGA xy (Yr+1—Yp)- (2.175 
Here all the functions and matrices are computed for x = Xs 


We itd, O1V ery 

If the dependence of x(y) is determined from (2.3), then 
note that (2.4), (205) follow -from/(2.16). 1f we set v= o* 
in (2.15) and omit taking into account inequality-type con- 
straints, then (2.15) turns into a method suggested and studied 
by a number of authors. We refer, say, to the work of Bertsekas 
[3] in which he studies the case when the auxiliary problem (2.3) 
is solved approximately: TH Cyoy) Il # O on each iteration and 


the formula (2.16) is used for computation. 


3. PROOF OF CONVERGENCE FOR THE SIMPLE ITERATION METHOD 


1. BASIC CONVERGENCE THEOREMS 


Let 


A, (x, u, v) = pie ' oat [Peg (g(x), U4), Den (A (x), 04)], 


A,(*, 0) = min p,, (f(x), 0). 
j€o (x) 
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Let us impose another conditions on 6 and y. 


eA,. At the Kuhn-Tucker point [xypaas Vals the System Ze); 
(2.6) is satisfied and the: quantity No (Xys Ve) 20 


In the sequel we shall need the characteristic equation of 


tPMee na bids ei 


IBCx,) ug, vy)-— ATI =7L Ooo (3.1) 


Here and below, in proving the convergence, we assume that 
6 and w are twice differentiable in their arguments. 
THEOREM 4.3.1. Let condition Ag and the sufficient conditions 
for a minimum of Theorem 1.7.4 be satisfied at the Kuhn-Tucker 
point za, = [x,,u,,v,]- Furthermore, let tN ets tad Chita © iE (xa, Us Va) 
be twice continuously differentiable in a neighborhood of Zz, 
and the rees Otmald Groots of the equataonmCGsiol bes siar i1etilby 
tessechan 1. Then, at Ay By Uy Vy) is sufficiently large, the 
problem (2.3) has a local solution and the method (2.14) converges 
locally to azy,. 
Proof. The system (2.5), (2.6) is satisfied at the Kuhn-Tucker 
point. Therefore, the formula (2.11) can be written in the follow- 
noo rMs 

HT gg (Nay es Ug)i= Li. (Xes Ue, Us) + 

+ 8x (Xe) Pgg (0, Us) Br (Xe) hy (Xe) Pan (A (%e), Us) CA 


For the corresponding quadratic form we have the estimates 
XT ge (Xe, Usy Ve) XS a [ie (Xe, Ue, Us) 
FAs (ay tes 4) [Be (te) BF Oe) DEL (re) [ek (e)]T] 2+ 
J Xe 


+A, (Xe, Us) >) [xThi (xsi = coke (Xe; Ug, Us) x+ 
J €0 (Xx) 


+A, (Xe, Yes Uy) X7 E (X«) gi (+ 2 hi, (xs) [ht (xa)I"| x. 


(288) 4 NUMERICAL METHODS WITH MODIFIED LAGRANGIANS 


By Finsler's lemma 1.7.4. there is a 7t such that when 
A (XyUye Vy) > T the matrix H(%4,Ux,Vy) is positive definite. 
But then there exist neighborhoods GOx, ) and, G(y,) - of, xy ©and 
es 
Ys =pluiggVglyosrespectively .weuch>thatefor jax ssG(xg) 05 vas gCy,); 
the matrix HBr Ua Vig) is positive definite, for any y e« G(y,) 
there exists an isolated local solution x = x(y) e« G(x,) of the 


problem (2.3) and the necessary condition of the minimum 
HeAxCy eh Vo Le Ob, Kyi rxtys) (3.2) 


holds. On the other hand, according to the implicit function 


theorem we can find a neighborhood G,(x,) One ex such that on 


* 
Gy (xy) there exists a continuously differentiable function x(y) 
Satasiying) (@S.2)re8 lace: Gy Oxy) so’ small that Gy (xy) Ge) 5 


we obtarn tirom (3.2 )) that 


a= ABE (Y), Y) Hey (Ys 9). 


Hence the composition function O(x(y), y) is differentiable at 


y = yy Mand 


dD (« (Ys), Yo 
Oh te = B(x, Ye), 


where the matrix B is defined f£rom.¢2,10)... By? Theorem 2.3.4, 
for the local convergence of the method (2.14) it suffices that 
the spectral radius of B(x,,y,) be less than one, which is known 
to hold if the modules of the roots of ‘the equation (3.1), are 
Strictly Wess than 1s fe 

Theorems 4.2.1 and 4.3.1 characterize different properties 
of the method (2.14). Theorem 4.3.1 is analogous to Theorem 4.1.2 


and they both give sufficient conditions for the local convergence 
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to a Kuhn-Tucker point. Theorem 4.2.1 shows the difference of 
(2.14) from the methods described in Section 4.1; by conditions 
A, - As we are assured that in the case of convergence the limit 
points will be Runa chens The methods of Section 4.1 do not 
have these properties. 

The analysis of the roots of (3.1) is difficult. Hence, 
introducing additional assumptions, we transform the equation (Ca) 
to another form. We write the vector h in the form 


h(%) = [he (x), Bb (x) ys AS (xe X) ae «3, 18 (x)], 
ne (xy eS Xho dd) 5 


Analogously, we write the matrices 


Ri (x) =[ex (x), E(x], Re(*)=[RE(*), be), 
“gu (g(x), 4) 0 


aaa | Dt ho (h (Xs 0) 


aly ax | ee SO): oe 
[De (x, y)] maa a (i (x), 0) [m2 (x) Jt 


: 5 a a a Qa a 
in block form. Here the matrices Ry» hes a (e ; buh have 
dimensions nx(ets), nx(c-s), (ets)x(e-s), (e+s)xXn, SxS, 


respectively. Therefore, we have the representation 


@ai 0 
res le eKer ar | 
[oe (x, y)]* | 
span (A(x), 2) [he (x) ]? 


Let [xy Uy > Vy) be a Kuhn-Tucker point. We can assume with- 
out loss of generality that the first s components of u(x, 


ATE nae = ie 
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| Dj (Xe; Ys) —) | DY (Xe Vey)" Be (S55 Yx) ny (Xs, a) ee =0. (3.4) 


THEOREM 4.3.2. Let condition Ag and the conditions of Lemma 
4.1.1, be satisfied at the Kuwhn-Tucker point 2, = [x,,U,,V,], tet 
the function H(x,u,v) be twice continuously differentiable in 


a neighborhood of gz and let the modules of the roots of the 


*? 


eC uientd ONMGS 445) Dems bite bly Kec sai ila minnie, mre) tiGleean@ouse) lhl) e 0(x, ) 


yo Cn xg v2) — 0 ' =a ve CaCxy) v2) ome ites G275)) 


Mines eaten As (Xy, Uys Ve) is sufficiently large, the problem (2.3) 
has a local solution and the method (2.14) converges locally to 
Zaye 
Proof. Suppose that the matrices Hf *,¥) and Lag (X, 9) are 
nonsingular. We multiply (2.11) on the left by 65(x, YL, (x, ¥) 

: -1 ; 
and on the right by Be ba as We obtain 

T pia 

Dy (x, y) Lis (%, 9) Ayy (*%) Y= 
=D (x, y)@Y (x, y) Hee (x, Y) Hay (X, Y)- 


Here we have set 


T(x, y)=In +e (x, y) Lat (x, y) Ry (x). 
Using (2.12), (2.10), we transform this relation to the form 
(T(x, y)—Tn)®, (x, yy =T (x, vy), (%, 9) —B(x, y)) 
yielding 


ixyy BG y) = ono ; CBO) 


The solvability of (2.3) is proved just as in the preceding theo- 


~ 


rem. By Lemma 4.1.1 the matrix Lg (Sx Vx) = Ly (X40 Vy) is non- 
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singular. Hence the matrix I(x,;y,) is defined and by (3.6) 
Di errle) = (rey Ye) ®, (Yay Ue)e 
Hence the equation (3.1) is adhitvdeeueh ee 
[Dy (Xe, Ye) —AT (Xe, Ye)| =0 


which with (3.5) taken into account we can rewrite as 


og —ALOeLErRE Mess ALOR] Litt] _ 9 


0 Pho Al ens 
Thus the c=s* roots are found explicitly: 
—1 <A, =, (A (x), vi) <1, je[l+s:c]. 


The remaining m-(c-s) roots are found from (3.4). Thus, the /# 
modules of all the roots A are strictly less than 1, which 


implies the convergence of the method. feial 


2. SIMPLIFIED VERSIONS OF THE METHOD 


The auxiliary problem (2.3) is usually solved only approximately. 
Just as was done in the second simplified version of the penalty 
function method, we will curtail the labor of minimizing H(x, y,) 


in x as soon as we find the point x, satisfying 
|. Xp» Yel< ee: (3.7) 


Here te, J zo @ eS Js elo TO prove the convergence of this ver- 
sion of the method, we assume that (x,y) satisfies a Lipschitz 


condition in) 3x: 


|D (x, y)—D (x2, YI </ 41 — xa (3.8) 
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We bring in the lower limit of He kav 2. 


d(x, y)= min ce alk y) x. G37,9)) 
|x \|=1 


es 
Tivall the conditions of Theoremsy4 sont and) 4.342) sare. Satasiied 


? 


then d(x,y) > O and there exist neighborhoods CCx,) ange Guy 


such that 

dg= int int ae y ESO: 

i XE G(xe) YEG(y4) ( ) G3e 10) 
Let us determine the error on the is iteration when using the 


method (2.14) with the simplified rule (3.7) for solving the auxi- 


liary problem, We have 


[Yn+s—Yol=1® (Xe, Ye)—Yol=|® (* (Ys), Ye) — 
—D (Xe, Ys) +D (Xp) Yx)—D (x (Yn), Yn))) (3.11) 


Using the Newton-Leibniz formula (see Appendix I), we obtain 


] 
ID(<(Yp), —® (te, ya)=[(P EMD | y, —y,) a, 
0 


Here y = Yq PUY aya). the total derivative of 6 in y 


ISMCOMDUGeEd AS ein Gane ) 5 
d® (x (y), Y) 
BAA y): 
We use the Cauchy inequality and assume that the matrix norm 


is compatible with the Euclidean norm. Then 


|D (x (Yr)s Yn) —D (Xe, YI r| Ye— Yo, 
where 


r= sup |B(x(y), y)|- 
yEG(Ys) 
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MrOM Gh AnGwG@ato i cos lel) wemob teat n 
I Yer1— Yo Sr Ye— Yo AX (Ye) — Hel (3:12) 


Let us estimate the norm of q = CY he Using Taylor's for- 


mula, we have 


A(X (Yn)> Yr) = 
=H (Xp, Ye) + <x (Xs Ye) D ++ TH ux (Xs Yu) 4 


H (Xps 2) =H (% (Yes Ya) ty OH en (Hs Yn) 4 


where 
x=x,+6,9, %= x (Yx)— 929, Ome U pa lea U Oral 


and the condition (2.3) is used. Let us combine the formulas. 


Then 
<x (Xp Yr)» D =— 59" [Hye Ya) Hex (% Ye) 4. 

Using the Cauchy inequality and (3.9), (3.10), we obtain 

del Q1< 7. (Xe Ye) | (3.13) 
Noting (3.7), we arrive at the inequality 

lx (Yx)—*el< F- 
Substituting this into (3.12), we have the estimate 

I Yess — Yel < Ye Yell + 7 ee (3.14) 


This implies that to ensure rapid convergence it is necessary to 
decrease in a particular way the coefficient ey» letting it go 
GO) PASINXO) ps 

Another way of adjusting the accuracy of solving the uncon- 


strained minimization problem (2.3) is to find a point x = x, 
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Satisfying 


WH. (Xr Yel < eel ® (Xp, Yr) —Yell- 


(Comalan) 
The following estimates hnends 
C=|D (xp, Ye)— Yul =I (x (Ye), Ye) —D (Xe, Yu) + 
+ D (Xp, Yn) —D (X (Yu), Yr) +H Ye — Yell XS 
<(1+7)|9,—yel +! ]*(o)— yell 
Usines Css) e ands lo) memo bibadsn 
l 
CSU Ye — Yel t az eC. 
Whence 
dhe 
C <4 I — el. 
Substituting this into (3.15), we have 
d, (1 
| 7 (Xp yy)| <P | — yl 
From (3.13) we obtain 
1 
Ix (Gs) — 24] < EE Nn — vel 
From (3.12) we find the final formula 
ee, 
[Yar Yel <0 | Ye— Yo |, O= Ee, (3.16) 
* k 


For local convergence it suffices that 


ORs Dae 


 Wwihake lie ware: 
OCC ict se 


- LS Sut niledent Lvarsmealloles 
d, 
For) teeta) OS er 


When this condition is satisfied the method (2.14), with regula- 
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tion of the accuracy for solving .the auxiliary problem (3.15), 
converges with a geometric rate of’ convergence. In contrast to the 
rule (3.7), one does not need to have ey, tend tatzero, “lf in 
CSeije (3515) e, 7 0, then by (3.14), (3.16), -Q ise mnlet 
plier of) the process, (2,14)) equal to +r .at the,point, y,, which 
corresponds to the formula (2.3.9). 

The estimates obtained show that the method (2.14) is stable 
under errors in solving the auxiliary problem (2.3). This proper- 
ty and also the relatively high rate of convergence make the 
method (2.14) a very effective tool for solving nonlinear program- 
ming problems. The method is ordinarily used after a satisfactory 
estimate has been derived for the Lagrange multipliers. 

In order to justify the method, the existence of a Kuhn-Tuckér 
point in the problem (1.6.1) was postulated. Apparently, one can 
prove that the method converges under more relaxed assumptions 
without the requirement, in particular, of the existence of con- 


strained Lagrangian multipliers. In this case, when (2.14) is 


used, ly, Ce Sane Keo LOWE V Cir, xy, will converge to X,. 
As an illustration we consider the problem (3.1.5). If we use 
o', then the method (2.14) leads to the process 
2 Bee 
ee a e 
Widens os is determined from the condition 


k 


X, == Arg min [ete ye| 
ee : 


For small values | x, | << 1 we have 
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| 1 
Fert) 2 cape tee Chr Fae 


AS? “Ul. + @O=e we have exoeae Ole =a. 
* 


Su THERSGCALEING 


To guarantee the conditions of Theorem 4.3.1 and 4.3.2 for Ay 
to be sufficiently large and preserve the elements of the matrix 


vn without changing the form or the functions o and "WwW, at as 


Vv 


convenient to use instead of @ and w the "scaled" functions 


— 9 (tg (x), u) 
yeaa, 


(g(x 
(h(x), vy) = VEO) 9 


The derivatives of the new functions can simply be expressed in 


terms of those of the initial functions: 


2(g, ¥)=97(8, 4), Ven (G, “) =P, (8, 4), 
gg (g, u) a Aor: (g, u), Pen (g, u) ee Pru (g, u). 


Here g means that the initial function ¢(g,u) is differentia- 
ted with respect to g, and asia next step, the value tg is 
taken for g in the resulting formula. The derivatives of yp 
are computed Similarly. Let us give now the basic computing 


formulas taking account of scaling: 


Fg (Xes Yo) =e (Xr Yo) + TR, (Xs) OF (Xe, Ye) 
B(x., Yx)=O, (Xe, Yx) — TOY (Xe, Yo)tt ze Xen Ya) A (Xe Ye): 
EE (0, uy) 


0 
0 VE (h (Xe); ml 





Peg (0s Uy) &x a 


Ditie, Us) edhe ae ae 
sree) ements 
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The equation (3.4) is replaced by 


[Dp (ee, Ys) Lae 
— He [O43 (x0, Ys) JT LZ (xe, Yo) R2 (%er Ye) —A1 gg |= 0. Leen d 


, 


The computing method (2.14) remains the same except that instead 


of g and h we take tg and th, respectively. 
suppose there exists a scalar d such that for all i « [1:e], 
a. SoC xX, ), 


d= Oga (9, us) =Vay (0, vs). 


Comparing the equations (3.4) and (3.17), we reach the conclusion 


that a simple relationship exists between their roots: 


x Ad 
ire a TTT Ay? coe 


and there is no root } =d. Hence taking t sufficiently large, 
we can guarantee that the condition |a| <1 be satisfied. This 


ILS) IRON IO) Iei@llel ati 


max adated tod 
d 2 
ea 
a 


where AG are tnew KOOUS Om (Cond Dia. 

From (3.18) we see that as t+ the || tend to zero, 
i.e., the convergence is superlinear. However, one should bear in 
mind that for large values t the value of H has many valleys 
and ridges, thus making the solution of the problem (2.3) more 
time-consuming, and the improvement is slow. Hence it is not 
practical to take large values of 1. At the same time, for small 
values of t the conditions of Theorems 4.3.1 and 4.3.2 may be 


violated. 
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We give the results of numerical computations of the non- 


linear programming problem (7.5.1). In Fig. 2 we show the 


2000 


7000 





0 30 100 e100, 200 6 


Figure 2 


dependence of N, that is the number of function evaluations, on 
the parameter Tt which is constant in each separate computation. 
The value N is proportional to the computation time. Using the 
method (2.14) with the functions 7 and Woe the computations 
were made for 12 different values of Tt and interpolation was 
CarGLledsOultwat IMtermedlave points.» lhe saccuracy Of »solvinge the 
auxiliary problem (2.3) on each iteration was given by the condi- 
tion Hx, u,v) || S-€,4) where. .61,= O.dsace50= 0.01. "S For cach 
value t the Same initial point Xp was taken, the same parame- 
ters of the methods were used, and the problem (2.3) was solved by 


the conjugate gradient method. The dependence was not monotonic. 
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The problem was solved most quickly when the values of Tt 
were medium range. At the same time, in the problems in which 
one cannot achieve a high accuracy On xs) Xen) Uhel-cocit Leaienits 
ought to be increased; its role will be analogous to that of a 


penalty coefficient. 


4, PARTICULAR CASES 


For the functions ~ introduced above we have the following: 


Vin (AY (4), 04) = vleM*e), 

Pho (M/ (xg), vb) = eh/*), 

Win (A! (Xe), vf) = vl [1 AY (x.) 2 (1+), 

Pho (h’ (Xe), Ub) = (LAs (x,))7E-7, 
tphn (AY (Xe), 06) = Br [Ad (x4) + of} — 20h! (x4) [1 + [M/ (x6) P17, 
Who (A7 (Xe), 04) = 37 [[h? (x4) +04 ]% —(vh)?] + [1 + [A/ (x) ]2] 72. 


Let the strict complementarity condition be satisfied at the point 
[X,sV,]. For these functions, if jeé o(x,), ‘then all 


Vane Gy), vi) > 0 and if 3 # o(x,) then all 
0 -< Yu nia, vi) ee Le Van (2 CX) v2) = 0, as is required 
in the conditions of Theorem 4.3.2. 
For hae ore ae we have 
Ay (ey User Us) = Pgg (Bi (Xe), Ut) = 1, 
As asGe) = Peale (Xs), Hi). 


For the remaining functions we have 


he (el (ae), us) = ae 4)’, Cia (ei (%), wh) =1, 
Pie (Bi (xs), wl) =1+ecosud, ug! (xe), uh) =I, 
oe (e! (xe), ut) =a" (0)B (us), p(B (xe), uf) =1, 


Wee BilXe)s tee) = Ve (Xely Ue), Oni (re), Ue) = |. 
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This implies that for all these functions the conditions of Theo- 


rems 4.3.1 and 4.3.2, Ay CX yr Ug Vy) > Dn (Ky rVx) 20, are 


satisfied, and that the quantity Ay can be made arbitrarily large 
e 


by using scaling. 


If for ©, wv) we take i and >, respectively, then 


H(x, u, =F (x)+ Dalat + gi (x)] gi (x)+ 


$F De + (sy —Aeh (s) (+ 


sae { th/ (x) act) HACK) > 0, 
TAT larctg th/(x) if W(x) <0. 


The method (2.14) with scaling takes the form 


Uber = Ub+ tg! (x4), 
he 1 =r [(vk-+ th! (x,))8 —(vk)?] + 


(3.19) 
+ op [1 + [AL (x)? ]7- 


If we use i and o>, taking! ro= lesandsay(a):= to then we 


obtain 


di (S34 20) = 


=F@+¥ [wt sai) |e @ +h le (hi, (x))* + 
t= ail 
of [Lp th! (x) + (ch! (x))* + (chi (x) Ak A(x) DO, 
T \(1—th/ (x))-3, if W(x)<0 | 
Wei = Ua tg! (xq), Veer =4 [thd (x,) + 3 
tui tr 2th! (x,) +3 (thi (x,))? if A’ (x_) 0, 
k 


C3520) 
[1 —th/ (x,)]~? if i {x,) <0: 


) 


The simplest computations result when ie and ye are used: 
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HT (x, u, v) =f (x)+ 


c 
+2 [et ge'o] eta Doty, OPP 
= nel 
Weer = Web g! (Xp), Ohar = (Oh + tH! (x4) 4+ 

This case is extensively investigated in the literature on modi- 
fied Lagrangians. In contrast to the preceding two cases, the 
function H has no second derivatives, which narrows the class of 
methods of unconstrained minimization applicable to solving the 
auxiliary problem (2.3); in particular, we have to give up Newton's 
method. The methods (3.19) - (3.21) have been used widely for 


solving nonlinear programming problems and are now part of the 


standard program library for DISO. 


4. SOLUTION OF CONVEX PROGRAMMING PROBLEMS , 


1. THE USE OF THE SIMPLE ITERATION METHOD TO SOLVE 


CONVEX PROGRAMMING PROBLEMS 


In most works on modified Lagrangians, the auxiliary functions 
H(x,u,v) are constructed such as to the Kuhn-Tucker point be a 
saddle point. This requirement is essential when, in order to 
solve the initial problem, one intends to use the saddle-point 
methods. We will use the simple iteration method to solve the 
problem (1.6.1); therefore the condition for a modified Lagrangian 
to have a saddle point is not necessary. In particular, if we use 
the function 2 defined in Section 4.2, then the modified 


Lagrangian will have the form 


Hx, )=FO+Y [r(bh @)) + v/e" 0) 
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It is easily seen that for any vectors x 


sup H (x, v) = sup H (x, v) =+ 00 


c c 
veE veE. 
e 


and, therefore, the left-hand inequality in the saddle-point con- 
dition (1.6.9) never holds; nevertheless, this function is quite 
effticLrent ein numerical caleulLavLons. 

Let (1.6.1) be a convex programming problem. The set Xo 
defined by (1.6.3) is not empty and the function f(x) is uni- 


LOrmly Convex. 10 xX ssel On UlereomeoxlSU Sma sca Lay sims UCH Ubait: 


? 


FOma Nya x eye E" and f, <€ df(x) “we have’ the inequality 
Py Sh()+<fhe, y—O +0] y—x | C454) 


To simplify the presentation we assume that in (1.6.1) there 
are no equality-type constraints. Then the basic computing formu- 
las (2.3) and (2.14) will become 


Uper= Wn (A (Xy), Up), (aan 


ee 
XE Be mas es v), ae 


H (x, v)= F(a+ > ap (A? (x), v4). 


In the first simplified version of the method we take as x, an 


arbitrary vector satisfying the condition 


[O,F1 (py NI <8 Dh (x9) En (W (24), 0h) 0h] <1. (4.4) 


_ 


In the second version, x} Satisfies the inequality 


| OH (Xp. Up) P< 6 2 [pp (7 (4), vk) A (x,)— 
—0(! (%) HPO, HI=Th C4 5) 


(303) 4,4. SOLUTION OF CONVEX PROGRAMMING PROBLEMS 


where 6 is some nonnegative number. We will show in the sequel 


that 


elses ees i (4.6) 


From these inequalities it follows that each point satisfying 
(4.5) will simultaneously satisfy the condition (4.4). This makes 


it easier to solve the problem of finding x for (4.4). 


k 


In all versions of the method, if we get the 


Mee)” Vic? 


computing process is terminated and the corresponding vector x, 


will be a solution of the problem (1.6.1). 


ZL ROOM OE GONVERGEN CE 


Consider the problem of finding 


Ve, =sup* intel (x, v), 
pee, tek 
If in the problem(1.6.1) the set xX, is nonempty and Slater's CQ 


is satisfied, then the set 


Ve ‘e ee (xO); ane 


is also nonempty. 

THEOREM 4.4.1. In the convex programming problem (1.6.1) let 
there ¢xist a point x, <= X,, let conditions Ag, As be satis- 
LLedaaletuct 1) hold torsanvomve> Om lets themtunct tons (hey ie be 
differentiable and eouvex ain Il, ieee =O = @ < Ay, Ghavel ene Ehoky 
vector having nonnegative components be taken as Vo: Then the 


sequence x resulting from the simple iteration method (4.2) for 


k 
any rule (4.3) - (4.5) for finding Xho either terminates within 


(304) 4, NUMERICAL METHODS WITH MODIFIED LAGRANGIANS 


a finite number of steps with x, or has x, a Mate pOLn te sas 
k +o, If Slater's CQ is satisfied, then the sequence {v3 is 
bounded, all of its limit points belong. to W and at least one 
such point exists. : 

Proof. From the condition vj, 2 0 and Az, it follows that all 


Vv, 2 0. We shall extensively use this property in the sequel 


without mentioning it. The function Swichev)) is convex 1n 


and therefore, using (1.2.14), for any hJ and fh! we have 
bn (AY, of) (R/ hd) <p (AI, 0/) 9p (I, 04) << thy (WY, 04) (A — A). 


Setting h’ = 0 in these inequalities and noting that by Ag, 


(0. v9) = vd, we obtain 
tpn (A7, vf) hf Sp (AJ,.0/)—p (0, vf) Svoshs, (457) 
ftarorlows thatv7Oreal? 0) 4: [bie], 
hd (py, (h/, v!)—v/] 0. 
Substituting the estimate for vind from (4.7), we obtain 


h/ [app (AJ, vf) —0/] > py (AJ, vf) h/ +p (0, v/)— (AJ, o/)>0. 


Summing up these inequalities for j e« [1:c], we arrive at the 
inequalities (4.6). From (4.7) and As it follows that 

O< yp, (h/, v/) <v/ for Ai <0, 

Ox<v/ <p (h/, v/) for h/ > 0. 
By A if vo 250° then 


(iby (AY, v/)—v/) hi =0 


labels alskae 
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i <0 oshs =0. 


If the last two conditions are satisfied, then v, (nd, v4) = yo. 

By A,, vont, we) is a nondecreasing function of hJ. Then 
w(hCs), Vi) VS) convex ain ~ x.) .sThais fol Lows from the obvious in- 
equalities 


h (Axy+ (1 A) x2) <M (xy) + (1 —A) A (x3), 
p (A (Axy (1 —A) ¥2), 04) <p (AM (xq) + (L—A) A (x2), Up) < 
< Mp (A(x1), Ux) + (1—A) tp (A (x2), Ug). 


Hence H(x,v,) is<also. convex in’ x. "and by (4.1) “is “uniformly 


Sic. ve COMVessm Or any) — xX) x, iH! ¢ 0 H( x, v the inequality 


= 


H (x, Ug) >H (x, 0g) + <M, x—2 +0] x—x |? 


holds. We conclude that on each iteration (4.2) the function 
, 


H(x,v)) attains its lower bound in x at a unique point. Hence 


the rule (4.3) uniquely determines Kye Bron <¢4.6) it follows 


that we can find x satisfying (4.4) and (4.5). 


k 


The following formulas are obvious: 


L (Xp, Ugsi1) =L (Xp, Ue) +§ (Xp, Ve), 


E (xg, ve) = dy (xn) [ald (xe), vb) —vb], (4.8) 


j=1 
OxH (Xp, Vg) =OxL (Xp, UR +1)- 


From the first equality and (4.7) it follows that 


L (Xpy Un+1) SL (xp, Vg) >R rl (x, Up), (4.9) 


iteestrreh anequalpmey. holids in, art 


where L(x,, > L(x, ,V 


7) 


least one of the inequalities (4.7). 


Vee) 


Consider the method (4.2). in case x. is determined by the 


rule (4.3). The functions H(x,v,) and L(x, v.44) are convex in 
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x, hence we can write that 


<0xH (Xz, Up), x—X,) =H (x, Up) —H (Xz; Up), 
KOxL (Xp, Unri)y X—XRY SL (x, Uggs) —L (Xp, Ve +1)- 
e 


The rule (4.2) for changing the vector v implies the equality 
(4.8), according to which the set of subgradients of H at the 
point [x,,V,] and the set of subgradients of the function L at 
the point [X,,V,44] coincide. The null vector belongs to the 
set 0 HC x, V,) since by (4.3), the minimum of the function 


H(x,v,) is attained at X,- Hence the null vector lies in 


0 Uy, V4): This implies an important assertion: 
Bie Ts, Oya Are mn eles Up+1)- (4.10) 
For any Vig We have 
od 
f (%4) SF (te) + Dd) vert hl (xe) =L (Xe, Mn +1) 
v=! 
which together with (4.9) and (4.10) yields 
F(X4) SL (Xe, On41) SR (Yet) =L (Xe, 4) = (401) 


=L (Xp, Ug) +§ (Xp, Up) SL (xp, Vg) > R (v4). 


Thus the sequence {R(v,)} is monotonically increasing. We now 
consider the problem whether this sequence is strictly monotonic-— 
ally increasing. 
h ‘ ; 
Suppose Viewdee CVE Om the Kt step of the iterative pro- 
cess (4.2). Then 


Vie Tey we (4.12) 


and by Ag, Ag the conditions 
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Os Vy 


DG eee Og Dene, = 0 (4.13) 


are satisfied. Furthermore, it follows from (4.11) that 
R (0g41) = (vg) = min L (x, vp) (4.14) 
xeEn 


and for any x <€ X the inequality 
c c 


Dd Ad (x) oh DH AJ (xy)oh =0 


i=! j=l 


is satisfied. Hence using Theorem 1.6.4 we have that x= tig 
Cn eu Cum MOG Ca tay) SOlwWies ClO.) 1s) med Nine KeumEs Ce) Spammers alnelhe, tlk yee 


R(v = R(v,), then) by C4 estorvadiees) <= \[t Jeu ewe have 


k+1? 


hd (xp) [tpn (A7 (xe), ve) —ve] =0. 


, 
But, aS was shown above, these equalities imply (4.12) and x, © Xye 
Hence, if (4.2) generates an infinite sequence, then the sequence 
{R(v,)} is strictly monotonically increasing and is bounded from 
above by PCs Hence it is necessary that the values of R(v,) 
tend to some limit d while remaining less than this limit for 
each k. All terms of the function E(x, 9 V,) are nonnegative; 
therefore they must tend to zero. We thus obtain 
tg hg) eee tk) ade] (5), 


7 4.15 
lim § (xe) Y%)=0, lim Ad (xp) te =0. ( ) 


From (4.1) it follows that L(x,V,44) is strictly convex in 
n 
Saunt leis 1 mee) 19 oma) and ly, «€ 9 Ly Vy) we have the 
inequality 


L (Xp, Un+1) +<Lys X—xpE> +0 |] 4—Xe PSL (*, On 41): (4.16) 
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For L, we take the null vector, and for x .the vector x,. 


Then 


Ol] Xp—XK|PL (Xq, Opi) —L (Xe, Ue+1) =F (%e) —R (0p). 


Thus, the sequence {x,} is bounded. 


Let us show that d= f(x,). Suppose that some subsequence 


{xp} converges to a point x #x,. If hJ(x) # O then 


lim vd = 0 and the case h(x) = a> 0 is impossible since then 


eee 


we should have >, (a,0) 0, which contradicts A,. Hence x « X, 
J 


automatically yielding x « X f(x) = d.. The components, .v 


*? k 


for which hd (x,) = 0 can tend to infinity. We shall show below 
that they are bounded if Slater's CQ is satisfied. 

Now we consider the method (4.2) when xy is found by (4.4). 
In (4.16), we fix the vector ins and minimize the left and right 


sides of this inequality with respect to x. The minimum obtains 


on the left-hand side for 


1 
X= Xe—5e Le 


Hence from (4.29) it follows that 
1 
L (Xq, Up+1) =R (V_41) SL (xz, Ye+1)—Zl Lit. 


For i, ‘we take the vector in 0 H(x tor whieh (4.4 eas 


Reed 
satisfied. Noting (4.9), we obtain 


R (nas) L (ms m+ (13) B Gems 04) SL Co 0) R (0p) 


ykat se aa wy 


the problem (1.6.1) has a solution. If the sequence {x,,Vv,} is 


Just as in the preceding case, aif or R(v R(v 


iIntanwre, then {R(v,)} is strictly monotonically increasing and 
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COU GAa TOMS a CAlmeloe aol 


Setting x = x) in (4.16), we have 


@ |x. — xe PSL (Xe, p41) >L (Xp. Unsi)+<Le, Xp—*4 OS 
< R {2,) — R (vg) +] Le ll le —*el- 


Noting (4S) G44) wer hand that 
Ol xe x4 FR (Ug) —R (Vg) + | Xe — Xe] V SE (XR, Ue): 


Noting (4.15), we conclude that lim X= Xy- 
k>o 


Let X, #9 and let us show that {v,} is bounded. From 


the conditions 


lim h(xpg)<0, lim A/ (xg) ok =0 
Rk —> 00 kR—+> © 


: , 
it follows that if for some subsequence {vz} the condition 
lim v2 = © is satisfied, then lim sere =i h(a Omarion a 
Ko k K-00 
j sci) wCsee: Definitions 7.3% of thevset:(o€x))s!i ae By, Giil0) 
fOF an arbitrary, vector .-x we have 
ec c 
fet & oght (xp-1) <f (x) + > fh (x). 
J= f=1 
Let us divide both sides of this inequality by Iva. lf 
lim ||v_|| =o, then we can find a set of numbers a., Oo. = 1; 
R700 eOCRy) J 
such that 


= S ajhl (x) < ’ a ;hJ (x) 


JEG (X4) JE O (Xs) 


for any x. But this inequality contradicts Karlin's constraint 


qualification (see Definition 1.6.7), which in this case is equi- 
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valent to Slater's constraint qualification (see Lemma 1.6.2). 
Hence the sequence { || v,, I] is bounded. At each limit point of 
the sequence {x,,v,}, the Kuhn-Tucker necessary (which are also 
sufficient in this problem) ona t ee for a minimum are satisfied. 


The inequalities (4.6) imply the convergence of the method (4.2) 


i © Wau Le ese 


. chosen from the condition (4.5). aaah 


5. REDUCTION TO A MAXIMIN PROBLEM 


1. PRELIMINARY RESULTS 


The methods described in Sections 4.2 - 4.4 are based on Theorem 
1.6.4 and the function (156.15) ispsuchsthat the vector y « he 
at the Kuhn-Tucker point coincides with the Lagrange multipliers 
and, furthermore, the numerical methods are based on the solution 
of the system (1.6.17). In this section, we reduce our problem to 
a maximin problem shallosous LO) Wl oe)), 6 DUE ne IcContmastuento them iha th 
ter, we construct a modified Lagrangian instead of F(x,u,w) so 

as LOMeuarantee thay foreach point of the, local maximin ite) cor= 
responding point [x,u,v] is a Kuhn-Tucker point for the initial 
problem (1.6.1). In this case the role of dual vectors is played 
by the vectors which do not coincide with the Lagrange multipliers 


but are somehow expressed through them. We introduce the vectors 


AEE’, p(g, A)=[9(g', 4), ..., O(e*, JE ES, 
BEE, ph, p)=[p(At, w*), ..., (At, pI EES. 


Just as was done before, dy, ee oy4> Oo? er Vy Dee 


Wal are the matrices of the first and second derivatives. Ob- 


viously, the matrices of the second derivatives are diagonal. A 
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concrete element of the matrix is obtained if we give its scalar 
gape! 
arguments, say, 275 sere 


Let us introduce a modified Lagrangian in the form (2.1): 


- 


M(x W=FH+ DY OG) M+ Dv) w) (6.4) 


In this section the index i takes on integer values from 
1 “to 6, ‘the index j takes on integer values from 1 to’ ¢&. 
Throughout we assume that A and wu have only real components. 


We impose the following conditions on @$ and pb. 


eA. The equation 
4( Bea ee O (5.2) 
? ? 
can be satisfied iff g = 0; the equation (5.2) together with 
the condition 
ue = (0, 2") (5.3) 
& 
uniquely determines ~ Petar any Given sue E., 
eA. The equation 
Y LCH Tay iyi 0 (5.4) 
can be satisfied iff 
nd < 0, vi Sy CH eur) 2 0, ny = 0, (5.5) 
the equation (5.4) together with the condition 
vo =p, ChI un?) (Se) 
uniquely determine wd for any given values hy and vw? satis- 


fying CS.0). 
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@ Ag. For any x, @nd, 7 ty) such that itor n’ Ge) and We 
fj ter" (1) A “theveondition (5.5)"ss-satisfiedtand for- any. x ex 
we introduce the functions 


¥ vw, wh) < ¥ ver ), w). 


p= 


Condition Ag is satisfied, for example, by the functions 
pane ine (2) eet (co  )detined asecticnl 4 cena 
we give the functions 

g® (gi, M) — gi sh rE -h (g‘)?/2, 
g' (gi, M) = gi (M+esin) +e’, —l<e<l. 


Hrome (ono it Loldlows. that tom the Lunectrons a ee a one has 


A= uu, i.€., the.-vector A. coincides with the Lagrange multi- 


pliers. For oo and on the equation (5.3) has the form 


ui—shM, uf =Al+esinA/, 


respectively, where the new vector )} does not coincide with ou. 


A simple function satisfying An and Ag is 


WM, Ww) =F +H —W))] 


for which the equation (5.4) has the form 


Get al eh page ge (5.7) 
Tae nd > 0, then for ae > O we have nd = 05) which contra— 
dicts the assumption nd > Of aor hd + J < O we have wd = 0, 
which contradicts the condition nd + yd < 0. Hence (5.7) cannot 
be satisiied fom uh’. > Owe Ifuch? <.0; thensfoumh) ious 10 mere 


have ud =p) ere ATsE hd + yd > 0 then hd = 0, which is impossible. 
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Thos, (5.7)? can be-satisfied.only, for~ bh, <.0;> and if. ni.< 0 


then 
ice eh wie 0 hove oF 


if nh? = 0 then 

uw = 0. vi = qu)? me Cu )?, hJyJ = 0. 
Thus if (5.7) has a solution, the conditions (5.5) are satisfied, 
the relation (5.6) determines the dependence wd = Voi for 
vi > 0. Hence, here the vector u is also different from the 
Lagrange multipliers v. 


Let 


q = 1x, 7,015 Gyo= [Xpee yas M(q) = M(x,A,u) 
Then q, is called a stationary point of M(q) if 


M,(Ge)=Oni, Ma (9x) =O, Mu (Ge) =%3- 


LEMMA (4 5ic ton Letscthe -functionss f(x), eG) .ihCo.: of and. whirbe 
continuously differentiable in all their arguments and let condi- 
tions Ag and An pe satisfied. If 2, = [x,,Uy,V,]}):.18 a Kuhna- 


Tucker point in the problem (46-41). then the system 


Ue=Q, (g (Xs), Ne) Us = Vp (A (Xe), Le), Cons) 
Pra (2 (Xe) Pye Wu (A (Xs); i.) 0 (5.9) 
uniquely determines A, and Hy such that q, = [Xyor_ Hy is 


a stationary point of the function M(q). 


PROOL. het, "2,7 = Px soll se aed be a Kuhn-Tucker point. Then 


g(x%s)=0, A(x) <0, L, (2) =9, 
vps 0, “oly (45) =0: 
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By Ag and An in this case the systems (5.8) and (5.9) uniquely 
determine’ A, and wp, such that M, Cay) = M Cay) = 0. Differen- 
tiating M(q) over "x" at the point q™ q, and’ noting (5.8) and 
Cle (219) pe we moO Ditadon 

M, (CA) ar 

ime fe (Xs) +2, (Xs) Pg (g (Xm); Ne) +h,, (Xz) Vp (A (Xe), Us) = Comaop) 
asf. (Xe) + Ox (%e) ta +h, (%4) Ue = Lal(Z,) =0- 
Thus for the Kuhn-Tucker point 2, the stationary point 


ay = [x,,\y,Uy] of M(q) is uniquely determined. /// 


2. INVESTIGATION OF THE MAXIMIN PROBLEM 


We replace the initial problem (1.6.1) by the constrained maximin 


problem tor MC): 


max max min M (x, 4, p). 
MEE we Ee xeEn ( H) (5.11) 


The next theorem is an analog of Theorem 4.2.1. 

THEOREM 4700.1. Let’ the function M(x), u)) be “continuously dit— 
ferentiable in its arguments, let the functions’ = “and We satis= 
fy conditions Ag; Ay and Ag; and let the vector 
Q, = [4,°A,,U,] bea “strict local ‘maximin “pointot (5-1) “Then 
for the problem (1.6.1) 

et. the point “x, isra ‘local-solution; 

e2. the vectors u, and v,; defined by (5.8) are such that 
Z, = ([Xy,Uy,%) is a Kuhn=Tucker point. 
Proof. Reformulating the necessary minimax conditions (see Theo- 


rem 1.5.7) relative to the maximin problem (5.11), we obtain that 


the conditions (5.9) are satisfied for qd,» By conditions Ag and 
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An it then follows that x, © %. For the interior problem in 


(5.11) for any x « G(x,) we have 


MU) =a) +S ole), M+ VM e), wl) < 
SMU, be He) =F +S OCH), +S VW), Hd. 


Using condition Ag we obtain that nn Ga] <i XD) ment Ome lal 


xe e G(x, ) n X, whence we conclude that x isa) Loca lasolutionm 


* 
Out Gli ers). 

The vector. 4, @is 42 Stationary .pointuorts M@Cq)) mhence= tie 
eonditions, (5.9) and (5.10) hold.implying that ~2, <isga Kuhn- 
WNDOlste jeyeyuints Shay (Cala oal)) 5) W/H/y/ 

The condition of the Theorem that q, is a solution of the, 


maximin problem can be weakened by requiring that qy, = [x,,A,y, Uy] 


satisfy the relations 


Pa (Z (Xe), re) =9, Dy (A (Xe), Ye) =, 
x,€ Argmin M (x, Ax, be). 
xeEEn 


To solve this system, one may use the approach described in Sec- 
tion 4.2. Here we limit ourselves to a reduction to the problem 
Comey) 

To use the methods of finding the local maximin given in 
Section 2.6, we need to show that the function M has properties 
Sammie aie COmUaOSe formulated for the function F(x,u,v) in Lemmas 
4.1.1 and 4.1.2: ~Toithis end, we introduce the following addi- 
tional assumptions concerning the choice of the functions ©, W 


and partially concerning the class of problems (1.6.1). 
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eAo- For any values of the arguments, the function (g,d) 
is twice continuously differentiable, satisfies Ags andwat une 


stationary (point, q, «of M(q) we have the inequalities 
81 = Pg (0, ML) > 0, Pag 0, AE)AO, Par (0, Mh) <O. 


A560: For any values of the arguments, the function jp(h,u) 
is twice continuously differentiable, satisfies Av» and at the 
stationary point q, of M(q) for je o(x,) we have the 


inequalities 


6£= Pan (0, wt) > 0, Yun (0, wf) 40, Yun (0, pl) <0, 


and for j ¢o0(x,) we have the relations 


Wan (A (Xe), BL) 0, Wyn (AY (xe), p/) =0, 
Puy (A (Xu), pL) <0. 


Conditions Ag and Axo impose restrictions on o> and w 
as well as on the dual variables for the specific problem (1.6.1). 


Condition Aio makes it possible to avoid later the strict comple- 


Me MMipeossnwy Worn svoyal, 


Let us) constructs three square matrices, of order, nmitm,e, nm 
and m, respectively: 


ip Mx ExP on AyWay 


Meade) =| Sig ls Chapin Gee whe 
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Here all the coefficients of the matrices are computed at the sta- 
tionary point, 1d,,..0f the function «M(q)., 


Just as in Section 4.3, we set 


A, (x, n, b) = min min [Peg ( oe (xX), ) ), Van (h/ (x), y)]. 
te {ize} sjelLtsc] 


LEMMA 4.5.2. Let the sufficient conditions of Theorem 1.7.4 be 
satisfied at the Kuhn-Tucker point, z, = [x,  Dyovy] for the pro- 
blem (1.6.1). Let the constraint qualification be satisfied at 

the point x,. Leta stationary point q, = [Xe Hy of M(q) 
correspond to x 


and let conditions A and A hold. If 


x 9 10 
the quantity Ay (x, Ay syle Ls sufficiently large, then 

ei. the matrices My (4x) and Nj (ay) are positive definite, 
the matrix Mogi i) is nonsingular; , 


e2. all the roots > of the equation 


|Hy (ax) - AT am! mh 4 


have strictly negative real parts; 

@3.. the point .q, 15.28 strict local maximin point for the 
problem (5.11). 
Proof. Differentiating M(q) in x and using the notation in- 


troduced in conditions Ag and Axo leads to the formula 


M ye (Gs) = Lx (2) + 2 big! (x) [gt (%-)]" + 
“> > Sihh (x4) [hk (9) ]7 + 


[EO (Xx) 
Dd Ah (xe) Wan (A7 (Xe), HL) [AE (re) ]” 
J € 0(X,) 
By A the last sum is nonnegative definite. From Finsler's 


10 
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Lemma 1.7.4 we obtain that the first three terms determine a 
positive definite matrix if Ay (By Ages Hi) TS SSuiLttT Client ya lame. 
This implies that My (4) ds positive definite. 
One can assume without loss of generality that hd (x,) = 0 
of: ©) oe |S aE hI (x,) <0) Geese Glue = sf = Cy Ahmime@chies wos 
auxiliary vectors: 
V=[qig (0, 2), --. 
+» Pag (0, AE), Pun (0, we), »- +> Pun (0, ws) E%, 


ry ~~ [Pun (A (Xe), Bs), sens Pup (As (Xs), ws) ] c BE’, R=: e-+s, 
P= [yup (254? (Xe), Wat), «os Pun (Ao (Xe), BE] E Eons. 


Introduce also the nxk matrix 
Ro = [gs (Xe), ++ +3 Be (Xe)s ME (Xe), o +3, AY (x.)]. 


Then we can write Nj (dy) in the form 


wep LS anor ny oe 


By Aso the diagonal matrix -D(T,) is positive definite 
and the diagonal matrix -D(T,) is nonnegative definite. From 
Ag and Aso it follows that all the coordinates of V_ are not 
equal to zero, and therefore the rank of R.D(V) is the same as 
ia uaOrt Be, that is maximal and equal to k. Therefore, the 
upper left block of the representation of Nj (ay) is the sum of 
two matrices, the first matrix being positive definite and the 
other being nonnegative definite. Their sum is a positive defi- 
nite matrix, hence the whole matrix Nj (ay) is positive definite. 


Using the partition of Mag Ste) into blocks one can show 


that its determinant is expressed in terms of determinants of the 
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matrices My Od) and Nj (dy) by the formula 
| Mg (Ge) | =| Mex (Je) |: | — Ni (Fe) | 


implying that Mog 6 te) is noneaetiat 
The proof of the fact that the characteristic values of the ma- 
rated os H, (ay) have strictly negative real parts is an almost 
verbatim proof of the same property of H(z) (see Lemma 4.1.2). 
By Theorem 1.5.9, tq, 1s 4 strict local maximin point of the 
problem (5.11). Valet 

We use the methods (1.21) - (1.25) to find the local maximin, 
taking as F(x,y) the function M(x,\,u) and as y the vector 
fr, ude E™. Let us reformulate the convergence Theorem 4.1.2 as 


follows. , 


THEOREM 4.5.2. Let the conditions of Lemma 4.5.2 be satisfied at 


the Kuhn-Tucker point 2, = [XyoUy»Vy]- Then there exist 6 > 0, 


& > O such that for any fixed 0 <e < é, Que a < 9 Sand for 
ea=91, the, methods (1, 21) )- (1.25) and their discrete variants 
of the form (2.6.20) converge locally to the stationary point dq, 
of the function’ 9MCq)e 

One can analogously use Newton's method and give sufficient 


convergence conditions. 


3. PARTICULAR CASES 


The class of functions 9 satisfying Ag is extremely rich. For 


oa) AMOUR areca ABE eror oe, 7, 6° the condition At 


is satisfied and, in addition, 


ge (0, M) = gn (0, M)=1, Pan (0, M) =O, 
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Wess CONG talons Ag holiday Ho 7 ha ee conditions Ag are also 


valid since here 


Pee (0, M) = Pee (0, M)e= Is Pig (0, M) =ch M «0, 
pig (0, MM) = 14+ ecosM 0, ha (0, A!) = 94a (0, A!) = 0. 


The twice differentiable functions ™~ are more complicated. 


Let us give an example of such a function: 


1 (n/)4 
8 (fA/ al (ed IAS ee (ee 
VM, Wha g | (+ — WW 
; 8 awe ow aie 
It is easy to see that yp satisfies Ax, and ava =aCUl De for 
nJ =0. If mY <0 then p = vi = 0. 
Calculating the second derivatives of we, £0ra3) <s(x,) 


we have 


hn (0, pL) = Pan (0, wt) = 3 (u/)% D0, 
Au (0, wt) = 3 [(w)% —(u/)?] <0. 


For j # 0(x,) we obtain 


hh (A/ (X4), pt) a ith (hi (x) ut) — 0, 
; j cain es 

Wha (Xe), ee) = ay 

Uae 2[1+ [rl (x) ]}*] 

If at the Kuhn-Tucker point the strict complementarity condi- 

tion is satisfied, then ie > 10) for” j t=-o Cx, ) Vandi thus condi— 

tion Axo is satisfied. To ensure the condition of Lemma 4.5.2 

that the value of AOS Ags He) is sufficiently large, one can 


use the scaling described in Subsection 4.2.3. 
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6. REDUCTION TO A MINIMAX PROBLEM 


In Section 1.6 we examined the minimax problem (1.6.8). Using 
it for numerical solution of (1.6.1) is complicated by the fact 


that for unfeasible points the interior problem has infinitely 


large dual variables as solution. Hence it is appropriate to 
modify the Lagrangian so that in finding the minimax the interior 
problem has bounded solutions. We have been able to construct 
such functions, but we had, however, to introduce more complex 


modifications depending on the vector x and the dual vector, as 


well as on the gradients of the functions defining the problem 

(1.6.1). The numerical methods obtained through this approach 

possess some new characteristics and complement the methods de- 
2 


scribed in the preceding sections of this chapter. We will go 


into more detail on these properties at the end of the section. 


fi. PRELIMINARY RESULTS 


Introduce the following modified Lagrangian: 


ent 
H (x, u, w) =F (x, u, nero LE (6.1) 
ht x 
where F is defined by (1.1), t is a positive parameter, $(t) 


satisfies Aad: The function $(t) of a scalar argument is twice 


continuously differentiable on no oct) #0 fOr any bt #° 0; 


TCO yee Oe ep CO) om 
As the simplest examples of 9 we use 
1 
Qi(th=zH, p,(t)=ch t, 
9s (4) =tarctg {—zIn(1+2), 9, ()=e—t. 


(322) 4 NUMERICAL METHODS WITH MODIFIED LAGRANGIANS 


Let us calculate the derivatives of ~ H: 


HH AXy Us W) as PAX, Uy, W) = thy (ey UO) poll, (x, 0, @)), 
H, (x, u, w) =F, (x, u, w)—tgz (x) 9’ (Fy (x, U, ®)), (6.2) 
Hy, (x, u, W) =F, (x, u, w)—2tD (w) Al (x) e' (F, (x, u, W)). 


Here and below o'(f,) denotes the n-dimensional vector 
column with coordinates $'(3F/ax), ape) (eat), 


From the formulas (6.2) and condition A iittOmowsithat 


14 
every stationary point of F is a stationary-point of H. In 
general, the converse is false. 
Let 
y=[u,w], z=[x, yJEE"*", F(x, u, w) =F (x, y)=F (2), 
f(x, 0, O) =H (x, y= FF (2). 


Wel writeroupe (6.2 ia nether tomm 


IH, (%, y)= Py (x, Y)—tP eg (XY) (Fe (%, Y))s 
ie (x, y)=F, (x, UY) =3F, (x, y) p (F(x, y)). 


(6.3) 


Later on we will need the following lemma. 
LEMMA 4.6.1. Let zy, = [x,,Y,] be a stationary point of the 
Tine BlOne hCG). letethe pfunetilons defining the problem be twice 


continuously differentiable in a neighborhood of x, and, let Asy 
n 
be satisfied. Then the function B(x) = 


1 


(8F(x,yx) / ax") is 
twice differentiable with respect to x at x, and we have 

d? , ) 

ae B (xs, Yo) = Fre (Xe, Yx)}- (6.4) 


2 = 
Here and below Fa (aD = Pe Ba DE ee 
Proof. The funétion, 9b) ais continuously differentiable in a 


neighborhood of the point x, being a stationary point of B(x). 
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Let B(x) Genote the eradient, ofe Bex) at the point Xo: Also, 
let us estimate the norm 
A (x) =| B, (x) — By (%e)— Fix (2) (x—x-) |= 
=| Fe, (xX) Ye) p’ (Fy (%, Ys) Fox (Ze) (¥—%e) |< 
<|F (x, Ua @ Ce, Go) 9 in (Ze) ee ae (Z.) (x=) | + 
+ [ax (x, Ys) Faz (2«)] ae (2) (x— Xe) |. 
Set 8 = ||F_ (24) Il. Since the functions f(x), g(x) “and 


h(x) are twice continuously differentiable, for any © > O one 


can find a neighborhood G(x,) of x, such that 


| Facx (%) Yo) Fe Xe Ys) ||<e VWxEG (Xs). 


For these same x the inequality [Fy 6% Vx) Ils Bae nen Wall eee 
satisfied. 

Next, by the differentiability of the vector function 
pCR Ce, 9)), noting that $"(0) = 1, we can take the neighbor- 


hood G(x,) so small that 

lp’ (Fe (, Ys))—@ (Fx (Ze) —F xx (Ze) (X¥—%s) |<e]x—x.|. 
Then for all x = G(x,) {we have: A(x) <4 (28+e)e||[x-x,||. And 
since e¢ is arbitrary, from the last inequality we obtain the 
formula (6.4). /// 
2, INVESTIGATION OF THE MINIMAX PROBLEM 
Consider the auxiliary minimax problem 


min max f(x, y). 
xeE"™ yeEn ( y) (6.5) 


THEOREM 4.6.1. Let the conditions of Theorem 1.7.4 be satisfied 


at the Kuhn-Tucker point [Xy oy Vy] for the problem (1.6.1) and 
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let the constraint qualification be satisfied at the point x,y. 


Furthermore, let the functions defining the problem (1.6.1) 


be twice continuously differentiable in a neighborhood of x 
e 
and let A,, be satisfied. Then there exists a ™T > O such 


* 


that for ablhwO"<*tie<im \therpoint?” Ziw=<[x, ull wy Joa" where 

wi = Wi, i .¢ [lve], “will bea strict local minime=x point of 
the problem (6.5). 

Proof. To the Kuhn-Tucker.point [XeiUy ool oe) there corresponds a 
point 2, = [x,,Uy,Wg)] which is a_stationary, pointvof .H. By 


HIN OIC ren Oren, yO mes TOs Dewars Pete woca la main imax Oth beds 


* 
sufficient that the matrix Bey hs? be negative definite and the 


Me) arses 


® (2, t)=H,, (24)—F,,,, (2s) Hyg (Ze) fy (Ze) (6.6) 


be positive definite. 

We obtain these matrices of second derivatives by different- 
lating (6.3), noting the stationary conditions and the assertion 
of Lemma 4.6.1: 

ae (Ze) na UY geal ay (Ze) tx (2s), 


H yy (24) = Pryy (24) — TF yx (Ze) Fey (Ze)s 
5» (2s) = ee (2s) ati (Zs)). 


One can assume with loss of generality that hd (x,) = 0, for 
Saeed hY (x,) <a 0) °POre 24s esa eect Ler 


Ri (x) =[ge(*), «++, B(x), he (x), «--, AL(x)], 
he (x) =[A¥ (x), ..., AS (x)], Ao (x) =[AS+ (x), 2.6, AE (x)]. 


Noting the strict complementarity condition we write the matrix 


Hea) in block form 
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— tN, | Qe+s) (e-s) 
Ove-s) n | —wN 2 


Ny =D (p)[RE(%«)]? RE (%-)D (9), N2=— 2D (h? (x.)). 


Here 0 is an foxe Ltinene ional vector WhoSsewtirs ti cn COOLGI= 
nates are ones, each subsequent (ery. coordinate is owl, 
All diagonal elements of the diagonal matrices D(p) and No 

are strictly positive. Therefore, noting the linear independence 
of the vectors S,.(X4), Deas we reach the conclusion that 
Hy (2) is negative definite and therefore has a negative defi- 
nite inverse which can be written in the form 


1 Pel 
i O(e +s) (e-s) 


He.) = . ; (ei) 


- 
If we substitute this expression into (6.6) and note that NS" 


will be multiplied on the left and right by matrices which by the 


strict complementarity condition are null, we obtain 


Die 1) [Fr +2W U,—)], 
where 


W = R(x) D (pe) Ni*D (p) [RE (*4)]7, TP = Pex (20)- 


One can show that for sufficiently small Tt the matrix 
o(25) —is positive definite. Using Theorem 1.5.7, we then con- 


clude that z is a strict local minimax point of the problem 


* 
G6 ke LL. 


If in the conditions of the Theorem we additionally require 
Lye (Ser Us Vie) to be positive definite, then for sufficiently 


small +t the matrix Hy (2) will be positive definite and, 
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therefore, H(z) has a strict loéal saddle at the point Zig: 
This is the case, for instance, for a convex programming problem 


with strictly convex f(x). 


3. NUMERICAL METHODS 


If the conditions of Theorem 1.7.4 are satisfied, then to Seite 

the problem (6.5) one can use methods for finding local minimax 
points. However, the function H depends on both the functions 
defining the problem (1.6.1) and their derivatives, which essent- 
ially complicates the numerical implementation of the methods 

given in Section 2.6. Let us construct methods analogous to (1.25) 
without this drawback. 


For fixed x we solve the auxiliary problem of finding the 


value of the vector function atss = [u(x), w(x)] from the con- 
dition 
Be Tea y). (6.8) 
We iterate by any one of the following schemes: 
4 a Kye TFL CXL Vy) ‘ C69) 
Ro x, 7 TO'CFL (X,Y) , CGR OD) 


where Vee y(x,). In this case, the analog of Theorem 4.2.1 is 


THEOREM 4.6.2. Let the function ¢? satisfy condition A and 


sual 


the sequence { } obtained from (6.8) and from (6.9) or 


Vi 
(6.10) converge to the point Leave sv 14 Then [xp mis vig, 
where yd = [wl ]?, J. en elds ¢]5 5 as) a Kuhn-Tucker point of the 


pLObermiGl ced yr 


———————————————EE 
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Proof. By Aj,,, the convergence of iri } implies that the 


agg 
condition F,(x,,U, Wx) = 0 is satisfied at the limit point. 
Moreover, at this point the necessary condition for a maximum 

Gf" HCUx,;,U;W)) in *u and *w must hold. Hence FOS Ua Wa) = 0, 
FOX) Uses Wx) = 0. Whence g(x,) = 0 and wih? (xy) == OMe fox aia 
j ¢ [1:c]. By Theorem 1.3.1 is it necessary that the matrix 

Poy Xen We) be positive definite, so that x,e X. hf 
THEOREM 4.6.3. Let the conditions of Theorem 4.6.1 be satisfied. 
Then we can find a t > 0 such that for all O< 1t< 7 the 


iterations given by (6.9), (6.10) converge locally to x, ata 


linear rate. 


We prove only (6.9); (6.10) is proved similarly. The neces; 


sary condition of the maximum in CGRS) aus 


HL (x, y(x)) = O 


In proving Theorem 4.6.1 we showed that the matrix Hoy (x2 Vx? is 
negative definite. Hence there exists a neighborhood of x,, 
where (6.8) has a local solution, the function yx) is single- 


valued, differentiable at the point x = x, and 





d 
We) HG} (Xa, Ye) Myx (Ser Ye): (6.11) 


Let us differentiate the composite function 


T x18 =) sotbdarPy (a7 6 7) 


OViCT EXT ecko Om Od IU xe eT. Noting (6.11), we have 


Hens T) a 


=1,—tF,, (2s) oe (Ze) Hy (Ze) Hix (2x). 
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Substituting here the formula (6.7), we obtain 
T, Ge, 1) =P, = that), 


where P= I,7W. Next we consider 


ay ee q™V (t) q S77 
n(t) =] V (7) aan <q, q> ’ V (t) a as T) ie (Xe, tT). 


We see by a direct check that the symmetric matrix P is such 


that PP = Pp, Hence 


¥e a oir a) 
algae accncoaea: 


V" (+0) = —[Fyx (Ze) P+ PF x (20)]- 


(G7 125 


The maximum in (6.12) obtains only for those gq for which 
Pq ="q;, ive., “when-*q*4 Ko (xy) (see the formula 1.7.16)). We 
use Theorem 1.5.3 on differentiating of the function of the max- 


imum and obtain 


n' (+0) =2 max JV’ (+0)9=—-2 min GPF y,(2)q<0 , 


Wai=1 Vall=1 


which together with (6.12) imply the existence of a tT > O0O such 
(tiateoemcs) > 1) for al Ose <0 mand Pe er | a1 
Noting Theorem 2.3.4, we conclude that the iterations defined by 


(6.9) converge linearly. 
4, COMPUTATIONAL ASPECTS 


In method (2.14) the auxiliary problem (2.3), isrto minimize H 
with respect to x, and iterations are carried out for the dual 


vectors. Therefore, such methods are often called dual methods. 
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Conversely, in the methods (6.9), (6.10) the auxiliary pro- 
blem (6.8) is to maximize H with respect to the dual vector, and 
the primal vector x is updated iteratively. (Some authors call 
such methods primal. The Corntnoloes has not been firmly estab- 
lished; however, it is convenient to use this term here. ) 

Within the numerical calculations, most time is spent solving 
auxiliary problems. Hence, if the total number of constraints m 
is much greater than the dimension n of the vector x, the dual 
methods are more appropriate, since in this case, solving (2.3) is 
much simpler than solving (6.8). If n is much greater than nm, 
then, conversely, one should use primal methods. Whenever the cal- 
culation of f,g, h is expensive, it is also better to use prim- 
al methods. This is because in solving an auxiliary problem, only 
the dual vector changes, whereas the primal method is recomputed 
only on the "exterior" iterations, which occurs far less often. 

The region of convergence depends on many factors, in partic-— 
ular, on the form of ${t) and’on which of the methods, (6.9) or 
(6.10), has been used. For several test problems, it turned out 
that the convergence region is greater for method (6.10). The 
choice of the parameter T determines the region and rate of con- 
vergence. It has been experimentally proved that the convergence 
rate decreases aS tT decreases; hence it is not worthwhile to 
akc b> OOmC HOSeCeLOm Ze1Or. 

tf we take @ for %, then (6.9) and (6.10) coincide, and 
the auxiliary problem (6.8) for +t = 1 is very close to the pro- 
bilem MCS Orn) et Ne 21. Thus, these methods are related to 


k 


the linearization method. 


Chapter 5 


RELAXATION METHODS 
FOR SOLVING 
NONLINEAR PROGRAMMING PROBLEMS 


By relaxation methods we mean iterative numerical methods where 


the objective function is monotonically decreasing. 


1. APPLICATION OF THE REDUCED-GRADIENT METHOD TO 


SOLVING PROBLEMS WITH EQUALITY-TYPE CONSTRAINTS 


L, Vite DEASO re fHEOMERHOD 


We consider the simplest nonlinear programming problem (1.6.1) 


involving inequality-type constraints. Suppose we need to find 
ENED Mea Y={xeE" g(x) =0}. Clot) 


In this chapter we assume that the functions determining the 
problem are differentiable with respect to x. Let X (XQ, t) de- 
note the solution of the Cauchy problem for the system of ordinary 


differential equations 


Fale (+e.(ul, ney. (1.2) 


We will find the feasible values of ue E® by the follow- 
ing requirement: the set Y must be invariant with respect to 


els: Sayisageym (al 2)), Woe 5 shit Xo € You hen X(Xp,t) SV Grone @yilil 
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t 20, or equivalently g(x(x_,t)) =) OF ncomet ha si were te 1 


ME 2 gf (x) [f,(x) + 8, (x) u] =O. ee 


If the matrix ga (x) a(x) LSenons rusian. then si rome 1.3). werden 


w= — [er (x) 6. (%)]—? 2: (x) f, (2). 


Substituting this expression into (1.2), we have 


Fale Ole Wael WFO) EY. (1.4) 
Differentiating f(x) using the system (1.4), we obtain 
F(x) fF () +8, (2) a). 
After some transformations, noting (1.3), we obtain 


Fe — If. () +8, (x) uP <0. (1.5y 


Hence the set Y is invariant with respect to the system 
Gilz2) seond alone any  solutzon x(X9,t) of the system (1.4) the 
objective function £(x(x5,t)) monotonically decreases. In the 
sequel we show that the solutions of (1.4) converge to local solu- 
POT GwOn a (1ee1)) a caneeG co, ee See nGine Thiel limite polnts sor so lu— 
tions of the Cauchy problem (1.4) yields a numerical method of 
solving the problem (1.1). To implement this method, we have to 
BLT sai ty ae EL ome elt a TX) ee g(x) e,(%). Its determinant is usually 
called the Gram determinant, for the set of vectors 
gx(x), BPC ne es). It is known (see, e.g., the Gantmacher 
[1]) that. in order. that the.Gram determinant of the matrix, [(x) 
be nonzero, it is necessary and sufficient that the vectors gi(x), 
i e« [1l:e], be linearly independent. Hence the right side of 


(1.4) is defined everywhere on Y if g(x) = O satisfies the 
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constraanty qualatacatvon on  Y (secs Detaini tiongaeis.3)e 
Let us dwell on a geometric interpretation of the method. 
We write the system (1.4) in the form: 


a =—M (x) f, (2), 


M (x) =1,—N (x), N (x) =a, (*) [Bx (x) &x (*)]7? Be (*)- 


G63) 


The matrices M(x) and N(x) are symmetric. A direct check 
Leuslshus that Me = MM = M, Ne = NN= N. Such matrices are 
called idempotent. The eigenvalues being zero or one, these 
matrices are nonnegative definite. 

If x ¢«Y and g(x) = 0 satisfy the constraint qualifica- 


tion, then the cone of tangential directions (the tangent sub- 


space) *to the set” Y “at the ‘point x “is given by 
Ber ee My A ee 
K(x) = {x € E': ga) = Ol, 
which is the orthogonal complement of the subspace generated by 


the independent vectors ley Shee gut %),, An arbitrary 


vector zeE"” is representable in the form 


, CAs) 


where a is the projection of Zz on the tangent subspace, b 
is its projection onto the orthogonal complement. The vector b 


is a linear combination of the vectors Bae 


Ey aeons deene 


Substituting this into (1.7), multiplying (1.7) on the left by 
T : 
g(x) and noting that gi (x)a = 0, we obtain 
7 -1 
[ge (x)e (x) ] 


M(x) 24 be ="N (x) z, ay boe=0 


d 


7 
8, (x)z ’ 


a 
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Hence the matrix M(x) projects‘each vector x é« ED onto the 


tangent subspace and the matrix N(x) projects onto its orthogonal 
complement. Ay vector 4% in the tangent subspace as projected by 
M(x) onto itself since if z ¢ K(x) ‘then gi (x)z = 0 and 
M(x)Z = z; the projection of 2 onto the orthogonal complement 
NGO 2 =O ele: eZ Sanya vie evo rin E" then M(x)Z € K(x) 
since g(x) M(x)z = 0. 

Thus the right side of (1.4) is the orthogonal projection of 
the antigradient 2s) onto the tangent subspace, so that 
the BES are integrals of the system (1.4). Several authors 
call the vector on the right in (1.4) the "reduced antigradient." 
Hence we will call the method (1.4) the reduced-gradient method. 
In a particular case where there are no constraints, the method ft 
(1.4) becomes the Cauchy gradient method (2.2.3) of unconstrained 
Ta ane aiteO Olan Cx) ee 

Let the feasible point x, be an equilibrium for the system 


G22) rae roms (is) ewer ind | thescorresponding svecvor su and 


*? 
noting (1.5) we have the relations 
etx aaeUe, f(xy) i 8, (SX, Uy = 0 

Hence, if the constraints satisfy the constraint qualification, 
thenatescach equLilibrium-point fer the system, (174) sthere corre- 
sponds a Kuhn-Tucker point [x,,u,] of the problem (1.1). From 
(125) at follows that at an equilibrium point the projection of 
f. onto the tangent subspace is equal to zero, which is a neces- 


sary condition for an extremum of the problem (1.1). 
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2, PROOF‘OF CONVERGENCE 


Introduce the set 
Q = {xeE™: £%x) < PCR e ye eee sl: Cire) 


THEOREM 5.1.1. Let the functions defining the problem (1.1) be 
differentiable on an open set containing ©, where g(x) = 0 
satisfies the constraint qualification; let the local minimum of 
f(x) be attained on 2 at the unique point xy Then the solu— 
tions of the system.(1.4) converge to the point x,..as_.t +4. 
Proof. We use the Lyapunov function v(x) = f(x) - f(x,). For 
any +t =O we have x(Xp,t) eae v(x(Xp9,t)) SO, Bay (al. yD) 
V(x(Xp,t)) is a monotonically decreasing function of t. Argu- 
ing just as in proving Theorem 2.2.3, we arrive at the required 


asserntaonig a /)/)/ 


3. COMPUTATIONAL ASPECTS 


In implementing this method, the system (1.4) is numerically 
integrated. Using, say, Euler's method, we obtain the following 


discrete approximation: 
Sere pea oe OM (x, DE Cx) 5 XyeY. (1.9) 


The accuracy of the integration must be sufficiently high since 
otherwise the computed trajectory may leave the feasible set. 
Hence in (1.9) we either need to take small steps a or use more 
accurate integration schemes. 

The simplest case is when the vector function a(x) as) Ita 
ear. Here the matnmaix Mus, constanteand) from (al. 8) aie stokes 


that 
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ah % BX 3 og ME () o B.% 4 Ex%o 
Thus g(x) = g(X_) for any value a, so that the choice of the 
step can be made without caring about the feasibility condition. 
, 
Several authors suggest that an additional correction should 


h 


be made on each Kt iteration (1.9), foreing the point back 


*k 
to the feasible set by solving the system of equations g(x) = 0. 
Here the point x, given by (1.9) is taken as the initial appro- 
Ximation. However, such calculations substantially complicate the 
the method. 

It is interesting that the method (1.6) can be derived from 
method (4.6.9) as a special case. Indeed, let us take oy for 
¢ in (4.6.1). We consider the problem of non-linear programming 
(1.1) with equality-type constraints only. Then the necessary : 


condition of the maximum in (4.6.8) is that H(x,u) = 0. Deter- 


mining u from this relation, we have 


1 


hehe Lae eee (1.10) 


We use the continuous analog of the method (4.6.9): 


ee Une > 


where we substitute the expression for u from (1.10): 


xen a -1 tf g 
eee ees eee ee eo ll (1.11) 


Differentiating f and g, we obtain 
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afar, 2 er T 48, at 
ato Oey ee eee ee 
(Calg 12) 
Oe ues. 
dt T e 
Integrating the last expression, we find that 
SexCoyy Sae PC hen a ei) 


The expression (1.11) has a remarkable property: all its trajec- 
tories, as “t > ©,” approach the feasible set X.° Here the func— 
tion f(x(t)) is not monotonic. This is seen from the right side 
of (1.12) where the first term is always negative and the second 
term can change sign and can be neglected for small Igll. The for- 
mula (1.13) suggests a new way of handling equality-type con- 
straints. If they are violated. One can integrate (1.11) instead 
of (1.6), which will guarantee the trajectories to approach X. 

lager (Claalsp) cite cRodlikenys alnege cise g(x(to)) = 0 then 
G5 CU) MSe0ciforcall ot ‘and: im-thise case tite equations Gi. 11) 
turns into (1.4), and the derivatives (1.12) coincide with (el Sat 
(1.3), respectively. 

Analogously, from (4.6.9) one can obtain computational for- 
mulas taking inequality-type constraints into account, but they 


are too complex to describe them here. 


2. A GENERALIZATION OF THE REDUCED-GRADIENT METHOD 


1. PRELIMINARY RESULTS 


The method described above carries over to the general nonlinear 


programming problem in various ways. We shall examine the 
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approaches, using the notation Gi)'6"°S) “andC1.'6..4)), Swe say that 
the points of Xo areManterior: points Of X= and the points of 
X\ Xp are boundary points: We combine the equality- and/or 

inequality-type constraints in 6 = [g,h]. The vector function 


m 


(x) thus defines a mapping Ho Jet » where m=etvte. 


Let 


v(®)=[v(@), y(®?), ..., p(®)], Vy (—9) 
= Vy (—®), wae, } y (—0”)], 


where, the function +(z) .of a scadar argument is defined and 
continuous for all values z:20 and satisfies the following 


conditions: 


y(0)=0, lim Y2@>50; y(z)>0 if z>0. 
Zee Oe 


As simplest functions y one can take y(z) = z, Zz? ene. 


For the numerical solution of problem (1.6.1) we suggest to 


find, the, limit (as t + ~) points of the solution of the Cauchy 


problem fon sehe system 
= —[f()4+0, (xy), MEX. (2.2) 


HES Cues VeCLORE py. e E" is determined from solving the following 


system of m linear equations: 


T(x) y+Of (x) f, (x) =0, (2.3) 


where 


P(x) = Of (x) ®, (x) + D(p(—O(x))). 


Wevtinud ~y arom (2. 3)cand substitute 1t inte the right side 
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Of (24:2) 2 Then (¢2).2) cane be rewritten any the eiorm 


Fa MORO, (2.4) 
M (x)=1,—N (xp N (Y= DL) T-* OF (). ZeS 


Introduce the index set: 
6 (x)={ieE[l:m]: D(x) =0}. 


In the particular case where there are no inequality-type 
constraints, the system (2.4) coincides with (1.4). We show in 
the sequel that in the general case the system (2.4) conserves the 
basic properties of the system (1.4): the feasible set X is 
invariant with respect to (2.4), the function £(x(x_,t)) mono- 
tonically decreases on all trajectories of the system (2.4) sa- 
tisfying the condition Xo € Xo: One of the ways of deriving the 
system (2.4) from (1.4) will be described in Subsection 5.2.3 for 
y(z) = Zz. Let us give sufficient conditions guaranteeing the 
solvability of the system (2.3). 

By Definition (1.7.3), we say that the function 6(x) satis-— 
iess tHe Constraint qualification ati. x it allt nenyectorc 
oJ (x), j < o(x), are linearly independent. From this definition 
it follows that if the constraints satisfy the constraint qualifi- 
CAvION NALS, Unens the number of) coordinates or O(c ma whach si 
multaneously vanish does not exceed n. 
LEMMA 5.2.1. If at each point x ¢ X\Xy the function) ~o(<) | sa— 
tisfies the constraint qualification, then the matrix T(x) is 
nonsingular and nonnegative definite for all x e X. 
Proof. We write [(x) as a product of a rectangular matrix B(x) 


(mx (n+m)) and its transpose Bl (x), where B(x) consists of 


(339) 5,2, A GENERALIZATION OF THE REDUCED-GRADIENT METHOD 


two block matrices: 
B (x) =[©F (x)|D (Vy (—®(x)))], T(x) = B(x) B(x). 


The Lemma will be proved Liewercane show that stomvany | soe. 
the rank of B(x) ais equal to wm, are ici haem an O) eS CS) mS 
maximal (equal to m), it then follows that T(x) is a nonsingu- 
lar nonnegative definite matrix. If there are no equality-type 
CONSERAINUS=ehenw ac cach interior pointy xe Xo the rank of 
B(x) is equal to m, since in this case for a nonzero minor of 
B(x) we can take the diagonal matrix D(Vy(-&(x)) ). The Lemma 
is also obvious if 6(x) = 0. By the linear independence of the 
vectors o°(x) there is a nonzero minor of order m for @ ix): 

Let sS components, e<s <m, of the vector function 6(x) 
be zero at x « X. We can assume without loss of generality phat 
they are ort). 0“ (x), eee 0° (x). Then the values of the func-— 


+ 
tions 0° a xyi Tee % a(x) are strictly less than zero. In the 


rectangular matrix 
[oe (x)]" 
Vie et ie ehh eck 
io: (x)]” 
with dimensions s x n, we determine a square matrix c(s) of 


order s_ such that its determinant being a minor of the matrix 


Vi) Oi wiacley Sas ee Celtiell wey AesHoR “Sibkelsy Gh jiubdvene Cralisnasy ley 
TheMconstraint qualicrcabton. | Lhe Wdeverminant of the matrix 
oe 0s (m-s) 
it a yee ee he 
Vet : 


(nm — Ss) St 


lie ee id VIE Ow 
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of dimension m is not equal to zero. But the determinant of 
Vo (x) is simultaneously a minor of the mae orders matrix, Bx). 
Hence thesrank of BCX) (is mascimail i.e. equal sco vom 9 /i/5/ 

From the Lemma it fatiees that) ih the constraint qualutaca— 
tion is satisfied at each boundary point of the set X, then the 
right sides of the system (2.4) are defined everywhere on xX. In 
the sequel we will say that the constraint qualification holds 
everywhere on X if it holds at each boundary point of  X: 

LEMMA 5.2.2. Let the conditions of Lemma 5.2.1 be satisfied. 
Then the symmetric matrix M(x) is nonnegative definite for all 
yeu 


Introduce the matrices 


: M (x) 
A =—TI-* (x) OF (x), p=| ar eee |. 


having dimension mx n and (n+m) x n, respectively. The proof 
of the Lemma follows from the representation M = Dee which can 
be cheeked by direct computations. _//7/ 

In what follows we will assume that for each Xo € xX the 


System (2.2) determines a unique solution x(Xp,t). Let 
QS {x ebr:"f (XV (xq), EX}. 


LEMMA 5.2.3. Let the functions defining the problem (1.6.1) be 
continuously differentiable on an open set containing the compact 
set., X «and let the function. y(2) satisfy the conditions, (2.4). 
Then for any Xo Xo the solutions x(X9,t) of the system (2.4) 
canmpe exvendedsasm ts oles anid themset smn <6 mE mda Makan ansitenmnas 


with respect to (2.4). 
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Proof. Let us calculate the derivative of the vector function 
(x) by the system (2.2): 


d® 
dro es me Dif,— DO, y. 


Using formula (2.3), we obtain : 


DO,y = —Oif,—D (y(—®)) y, 


&O- =D (y(—® (x))) y. me 
Calculating the square norm of the Lagrangian gradient 
L(x, y)=f(%)+<®(x), y>: 
(L(x, y) |? =(fE + y7™@2) Mf, = 
= fi Mf .-+y? OM, =fiMf.—y"D (y(—®))y , 
we obtain 
Aa-|L, (x, yP—|D Vy Pe) yf <0. (25730— 


dt 

IN SOikbagsoil tone (AaZ)) GelSis ele WeEeSis siere ie Stier wien 
x(Xo,t) e X. Let us show that x(Xp,t) does not leave the set X 
One Bue Ww 2 OA isheieyeyeysres Tog y ISiv 0) (x(xq,t)) > O for some 
t > 0. Then there is a time ty Such= that 0) (x(x5,t4)) = 0 and 
b9 (x(x4,t4)) > Oe elthis con tradiceis) (2.6) since) ¢O)a— 0s = Hence 
x(Xp,t) eX for all (t Ss0r Thus the function y(-%) introduced 
above plays the role of a "barrier," preventing x(Xo,t) from in- 
tersecting the hypersurface 6(x) = 0. The trajectory x(X_,t) 
Calmappnroachie the sboundaryepoLn ts On ly as st 29°. te the nats ad 


IQOuae. “2s is on the boundary, the entire trajectory of the system 


O 
(2.4) belongs to the boundary. The functions g(x) defining 
equality-type constraints are integrals of the system (2.4). Hence, 


since the set X is bounded, the solutions of the system (2.4) 


are extendable as t +o andthe set X is invariant with respect 
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to, (244) “From thie’ and (2. 7)>) > ws Tavarient. 9/7 / 

Let x, denote the points in X at which the right sides of 
(2.2) vanish. We call them stationary points. We will denote the 
corresponding values Wx og DY Vax At the points x = x, ~we 


have 


Ly (Xe Ys) =0, D(y(—® (xe))) Ye =9, 
De (Xe) Dy (Xe) Yo + DF (Xe) fe (Xe) =0. a 
Each point xX being a local solution of the problem (1.6.1) 
issstatdonary. Indeed, were this) not So,s vhen| takanee <saase an 
initial point for the system (2.4) we would have that the solution 
KC) Len TT Ce GRE) So RD Or, «bt 26 = since, Sai (x) dts <1. 
But this contradicts the condition for a local minimum of the func- 
Calor INE S)s 
Let us dwell on a geometric interpretation. Introduce the 


tangent manifold to the set of active constraints as the point x: 
K, (x) ={x€ E*: [OL (x)]?’x=0, jfeo(x)}. 


The vector x defined by (2.4) belongs to K(x) ateeach x, 


Indeed, from (2.6) we have 
<D{ (x), x> = yp (—O/ (x)) y/. 


It, 4) evo(x)- at ther poimte xy et then 62 tt) = 0 and the right 
Side of the equality is zero, hence xX eé K(x). 

Suppose j ¢ o(x). By, (2.1), if 2+ 0 then v(z).— 0. 
MOE, “Aelave) joule) jewlery ae Se OMIE@Y Tries jfesieaieliieianie oJ (x) of an inac- 
tive constraint tends to zero while approaching the hypersurface 
35 (x) = 0. Owing to this, the trajectories of (2:4) do not inter 


sect the region determined by the inequality-type constraints; the 
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trajectories can approach the boundary arbitrarily close, touching 
them in the limit. The function’ y(-6$) automatically changes the 
direction of the vector K(X, t) near the boundary. Different 
types ots barrivert! Sunerrone give rise to different kinds of var- 
iation of this rate. 

Far away from the hypersurface h(x) = O when hd (x) <a 08 
one need not fear that the trajectory x(X_,t) intersects it on 
asmall intervals (Ct, t+o). @eandsimsthe tormula for determining 
M(x) we can omit the function hd (x) and its derivative, bring- 
ing them into view only when -e < hy (xeag ye <= OM@ewheres ces © 
is chosen depending on the step of integrating the system (2.4). 
This maneuver helps us to lower the order of the system (2.3). 

On the other hand, the introduction of the barrier functions 
y(-®) causes the trajectories to "stick" to the boundary, since if 
hI (x9) = 0 then hI (x(x, t)) = 0. It was postulated above that 
all hI (x9) < 0. To remove this drawback, one can omit in the 
formulas for M_ the functions nd and ne and calculate the de- 
rivative ‘J = (nd, x) for the new system. If the’ derivative 
turns out to be negative, then we continue the motion along the 
trajectory On eu’ Se SVS COM. mali other words, by removing the "bar- 


rier" we check whether this can be done without violating the 


feasibility condition. 


2, PROOF OF CONVERGENCE 


THEOREM 5.2.1. Let the conditions of Lemma 5.2.3 be satisfied 
and let all stationary.points of, X be isolated, Then ,for any 


nonstationary initial points Xo € Xo the solution x(X9,t) of 
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the system (2.2) and the solution y(X%_,t) AOMM Cet BOM MCA ston) 


CONnVeErsetasy ties scoOebhe Kuhn= licker spond 


n+m 
envy = [Se Were) e F : 


Proo in eebees be an arbitrary point sin Xo and let x(Xp,t) 


O 
be the solution of the Cauchy problem (2.2). Since X is compact, 
theyset ofp a-limit points son torathensolution X(Xo,t) is non- 
empty. We show that w is in the set of feasible stationary 
points. Since f(x) is bounded below on X and f(x(Xp,t)) is 

a monotonically decreasing function of _t, all points of the set 
@ lie on the same equipotential surface of f(x) (see Barbashin 
[1])}. Let & « w. We pass the trajectory x(x,t) through %X. 

Any point of it also belongs to w, hence £(x(x;ta) Oman G 
therefore £(X) =0,1 -But. we,seey irom, (2,7) that: this, is, possible 
On Veit Xpedisnaes bata onary, pore ak (2.2). Whence, since all 
Stationary points of xX are isolated, we obtain that w consists 


of a unique feasible stationary point x which Bx it) 


Xo» 
approaches as t + ™, 

For each x = X(X9,t) one can define y = y(Xo,t) from 
(2. 3)e% Since the, functions ;ypeof et vis continuous, the existence 
Of egg t= zoe (X_,t) pur iace the existence of vyee poe y(%p,t). 
Set uy = ee for, »ise j[1:e le, and vd = yire foranad ex lic ci. From 
the preceding lemma it follows that x, € XX. Wesshow, thats atithe 
point Pee Vig the complementarity condition is satisfied and 
V, 2 0. At the limit point xe UCmCONGInthoOncmCZ ac) maitemc a tos 
fied. Noting (2.1), we obtain that if vJ #0 then 


ie : 
0° eC) = heer) = 0, and hence the complementarity condition 


(156,5)" holds. From (2,6) we have 
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hI (x (Xq, t)) = h/ (xo) exp (B/ (2), (2.9) 
where 
Pop\ee y(—h/ (x (xo, t))) / 
B (2) [ew a MECN v (Xo t) dt. 
If one assumes that vd < O then hd (x,) = 0 and there is 


a = such that for all t >t 


aol y(=F (x (xo, 2) 7 
W(x, A >0, MT SO, BN SBG , 


then lim 6(t) = 8(t) and by (2.9) for nx) < 0 we have 
t-« 


lim Bate at) < 0, which contradicts the complementarity condi- 
to 


tion by which h’(x,) = 0. Hence v, 20. The limit point 
[x,,U,,V,] 1s thus a Kuhn-Tucker point. /// , 


It is easy to see that the conditions of the Theorem may be 


relaxed by requiring that they all hold on the set , rather 
ibhane One XX. 

3. ESTIMATION OF THE CONVERGENCE RATE 

For simplicity, we consider the case where y(zZ) = z. The method 


(2.2) and the formula (2.3) do not change, and the formula for 


T(x) and the equation (2.6) become 


T(x) = OF (x), (x)—D (®(x)), (2.10) 
a =—D (P(x) y. (2.11) 


For further study, the following elaboration of the method is 
useful. As in Subsection 1.7.4, we introduce here the additional 
VECLOLNOLNArhin tera lL variabkes) |p <~E° and consider the minimiza- 


: + 
tion problem (1.7.17) equivalent to (1.6.1) in the space fee 
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with equality-type constraints only. By the formula (1.7.18), we 
form the Lagrangian LC pry. The reduced-gradient method (1.2) 


applied to solving this problem is the following: 
e 


dx a 


db. ok Ly (x, P, y) =—[f.(*) +2. (*) uh, (x) 0], (2.12) 
ahs, DP, y)=—z D(p)0. Sages 


Here y = [u,v] « Hin let 


z=[x, p], R2)=|e(), --- 
B(x) MTG (OY, --., B+ 4 (| - 


For determining the vector y we have an analog of the equation 


Gas Dye 


Dz (x) D, (x) yA DE (x) fe (x) + 


Ove ED (MDP) vias 


The system (2.12), (2.13) is solved so that R(z) is its integral. 


Lety the winwtival point [Xp ,Po] satisfy the conditions 
B(x) =0, A(x.) D (Po) Py =0. (2.15) 

Then along’ the trajectories of°(2.12),,%(2: 13) we have 
h(x)-+D(p)p=0. (2.16) 


Noting that R,(x,p) = o (x), we conclude that the system (2.2) 
coincides with the system (2.12), (2.13). We can determine p 
from (2.16) and then omit (2.13). In the sequel we assume that 
thes Ccondmt1ons) (2. lol are satict Ted, 


Let 


L'(2, y)=L*(x, p, y), R(2)=R, p), 
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a [Se heya 1 eh ie fee Daten 
Re [ Poe L, (2; y) [pan | 


We can write the system (2.12), (2.13) and the conditions 


(2.14) in the form 


d 
< = —Li(z, y), Re(2) 142, y)=0. (2.17) 


Assuming that the functions defining the problem are twice 
differentiable, we differentiate ¢$(x,y) = > leet) by the 


system (2.17) and obtain 


dob _ ZL Se owe alt eT 
ate) ~(b;) Loe be (Lids Wet; 
vii Lr oe. 
Ct See ee 
, 
The matrix on is given by the formula (1.7.22), We assume 


that the conditions of Theorem 1.7.5 are satisfied and, further- 
more, the matrix L (u,v) is uniformly positive definite on 
the cone K,(x) (see. Definition. 1, %/, and the, formula CL. 7.23)). 


lay (C2qal70) 


a9) —_C,) Lip = —2C,9 (2), 


p (t 
dt 
p= Oe"; 


passing back to the original notation, we obtain 


[Le (x(x 4), y(2)\F+ID(V A (eo. ))) 0) P< 


<[IL. (xo. yo)? D(V —A (Xo) 00 |] e- 2°" (2538) 


implying that the method converges exponentially to a Kuhn-Tucker 


point. 


(348) 5, RELAXATION METHODS 


feo eGlAlNGASES 


suppose we solve the problem (1.6.29) in which the set X is de- 


fined by the condition (1.6.2), 


U={xEk": x20}, Us= {xe ks x> 0}. 


Assume the point Xo € Up 1X is known. Then, instead of (2.2) 


the system 


= —D (yp (x)) [f, (x) +®, (x) y] (2.19) 


Lseintesratved.s | [he vectorm:y E" is determined from the system 


T(x) yD, (x) D (y (x)) f(x) =0, 
P (x) =@; (x) D(y (x)) ©, (x) + D (y (—® (x). 


Eliminating the vector y, we come to a system of the form (2.4), 


where 


M (x) = D (y (x)) [7, —®, (x) P>? (x) DF (x) D (y (x))]. 


If instead of the condition x = 0 the constraint x =a is im-— 
posed, then in the formulas we need to write D(y(x-a)) in place 
of D(y(x)). If the constraints have the form a < xt < a OL 
qJ = hd (x) < as, then two barrier vector functions ¥4¢), 
YoCh(x)) are introduced whose ye! and ie coordinates are, 
for example 1408) = tana: Vee Oe 

¥a(h(x)) = (hI (x) - (2) Cad SHI R)L The spatems (223? ahaetansy 


have the form 


& = —D(y, (x) (f, +,y), 


[D:D (v1 (x))®, +D (y2(® (x)))] yADID (9, (x) Fy = Onn. 


Constraints of this kind do not raise the order of the linear sys- 


tem (2.3), which makes the computations much easier. 
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Consider now the linear programming problem 


: T eliars te 
min atx, me) Ax=b, x>0}, (25220) 


where x,a « ae be Hoe A is an exn matrix. The dual pro- 


billem to. €2. 20) consists in) findings 


max6?u, U={uGe: At =a}, 
ueU 


Setting y(z) = Z, we obtain that the method CBedk)) stone Solle 


ving the primal problem leads to the system 


LPH yas a), wheres ADGHATU = AD( a)et (2521) 
: Tie ae Ts 2 : 
ine chs Case needs x= - |[DCvx)(a— A u) || <7 Oe esi: XQ > O and 
AXg =p). Analogously, for the dual problem 
2 
i = pak, wheres UIA Acer pCa “we a9 x! Sal Dene (Bee2) 


we have the inequality 


pia = |[b-Ax||? + x D(x)(a-Atu) > 0 


Tf ane <a. The relaxation method (2.21) of solving the primal 


problem is effective for solving problems of large dimension but 
with a small number of constraints (n >> e) since in implementing 
the method a matrix of low order is inverted. Similarly, the me- 
thod .c2..22) As’ convenient if “e >> n- The Ne bHOCdS C2. 2) aid 
(2.22) undergo only slight changes for quadratic programming pro- 
blems. 


Suppose the problem 


min f (x), X,={xEE™ Dy xt a, x 0} (2.23) 
xEeX, oa 
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is to be solved. The method (2.19) leads to the following system: 


ana" le Yap fae} (2.24) 


tal 


If instead of X we take the set 
n 
Xs ={xe Br D>) =) x of, 
t=1 


then the system (2.24) does not change; however as initial data 
for the Cauchy problem one needs to take an interior point of Xo. 
The problem (2.23) is often encountered in applications. 
Here are two examples. 
Let the function wW(z) be defined on ES. It is required to 
find the minimum of w(z) on the convex hull of {Z1,Z9,-+-,2,}. 
The problem reduces to finding 


min p S xz) x Xy={ x6 En: Sys reol. (2.25) 


xEeX, t=) t=1 


ig t 
Setting f(x)/ = v| ) xz |3 we come to the problem (2.23). 
i=l 


In solving antagonistic two-player games with an infinite num- 
ber of states we introduce probability measures uw and vw on the 
o-algebra of subsets of Z and Y, respectively: 

max min| | F(z, y)p(dz)v (dy). 
i Vee Ze 


The problem reduces to the following: 


max min \ F (zay) uo (d2y 


Pp ey ?y 
Approximating the measure wu by an atomic one, we come to a pro- 


bkemy eclose=to)@2)j0nr: 
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na 
max min >) F (z;, y) x!. 
xEX, yeYl=1 


This argument has been studied in more detail in Evtushenko and 
Ziadane| ds. f 


Suppose the problem 
min 7.2); X,={4 6.2"). 4<,x2_b} 
xEX, 


is being solved. Treating these constraints as those of the form 
NOx) S20 paeanceusinenco. Lt)e e440) sand(2..3). we easily see that 
the .system,.(2.3).hasVan analytic solution. “Substituting, ithe vec- 


OPV LOuUnCmEronmmG 43) mint Om Ccnel)) Wem havic 


(xé—al) (6! —x!) 


dxl we Serf (x) 
bi —al + (x! —ai) (b!—x!) : 


qt as ap EO VT) 





Poe weiee > awe have 6; (x) oO ipl tee oes oo Ore ex be then 
>; (x) + 0, due to which the trajectories do not leave the fea- 


sible set. 


3. A DISCRETE VERSION OF THE REDUCED-GRADIENT METHOD 


We integrate the system (2.1) by Euler's formula 


Kno =%p— OM (Xp) fF (Xe)- (3.1) 


It will be shown below that if the equality-type constraints 
depend linearly on x, then they remain constant for any values 


of in this case we can obtain a high convergence rate of the 


Oh 
method since one can take relatively large values of Oy In 
numerical implementation of the method, the integration step Oy 


is usually the same on each iteration, but 1t needs to be 


checked additionally whether the relaxation as well as feasibility 
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conditions of the point x have been satisfied with respect to 


k+l] 
the inequality-type constraints. If these constraints are viola- 
ted, then the step needs to be reduced until these conditions 
are finally satisfied. As was shown in Section 5.2, near the hy- 
persurface hJ (x) = 0 in the method (3.1), the motion in the dir+ 
ection toward this hypersurface automatically becomes faster. Due 
to this fact the step needs to be reduced relatively seldom. 

Numerical computations show that in a number of cases in or- 
der to integrate the system (2.71) it is’ useful to follow LINO ae net 
methods of integration. If there are no nonlinear constraints 
among the equality-type constraints, then the process (3.1) slows 
down considerably since one needs to take sufficiently small inte— 
gration steps to guarantee the smallness of WAS Ie 

Let us prove the convergence of the process (aki), consider= 
ing only the case where the vector function 6(x) depends linearly 


on x, Y(z) =z, and the step oa aay (CSioib)) Shs) COmeieiiiin, iP wae 


k 


conditions of Lemma 5.2.1 are satisfied, then one can determine 


the maximum of the norm of the matrix MGx)) on xX usin’ tthe 


relation: 2 
; = 
X= max ae oc. 
xeEexX xX EEN | x | 
Set 
v=max max y/(x). 
Jé[lim] xex 
Here y(x) is determined from (2.3), (2.10). Below we will write 
yun 1y CA: 


THEOREM 5.3.1. Let the conditions of Theorem 5.2.1 be satisfied, 


let the vector function “o(x) depend linearly on x, and let the 
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funciona Sabisty ee hai pschi tz conditaon on X= wich constant 
Me Ma esal 
P al a : Tee tens 
Slisators; O7¢<e0us Bi aloe , TT ] and any nonstationary initial 
points xX, € X>, the sequence { 
point, £( x, 44) < £(x,) forbs Kes Ole 2a. 4.7 


XI converges to a Kuhn-Tucker 


e2. if, moreover, B||z|}7 > z Le (x, y(x))2 > blz ||7 for all 


n+ec 


? 


Ze H then we have the estimate 


[L. (Xe: y,)[?+)D (V —A(x,)) YP < 


< [| Li (Xs Yo)? 1D (V Ao) vol?) [1—ab +arBy, 69°? 


the sequence x, > Xx, where x, is a local solution of the pro- 
Demmi sGrels ie 


Here 


Liz, N=fet SO y. 


Proof. From the linearity of (x) it follows that 


® (xp1) = (44) + [SE | rns). 


Substituting here (3.1), we obtain 
D (Xp41) =D (Xp) — ADE (Xp) M (Xn) Fe (Xe). 


Using the relations found in deriving (2.6), we come to the equal- 


ity 
D (X41) =D (x_) FAD (@ (x4) Ya 


i 
Thus, if (x9) = 0 then 4(x,) = Ose for Tan On eel et ly3.| < 


a < 


<lH 


and $1(x,) <0, then 


OD! (X44) =O! (xp) (1—ay),) <0! (Xp) (1 —va) <0. C3209) 


Hence Rous X implies all Ria Xy 
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Using the Newton-Leibniz formula (see Appendix 1), we obtain 
lo? 
F (Xe41) SF (Xe) — FE (Xe) M (Xn) Fie (Xe) +| M (xx) fF. (%e) ?- (3.4) 


According to Lemma 5.2.2 the symmetric matrix M(x,,) is positive 
semi-definite. Let vm Ee itis, squarevroo ts +, M.=4Avm oi. 2 ting 
troducing the vector a = VM Les we transform the inequality 
Gas tomther form 


(ess) —F (2) < : 
<afap|[—14+34 | Setar [14%]. G35) 


Tare 


MGUISS. SEOWO ey =< 4 the sequence £( x, ) monotonically decreases. 
Since f(x) is bounded from below on xX, it follows that the 


limit Out mG We f(x, ) exists. Hence 


lim [f (x —f (x,)|=0. 
k LF ( e+1) F( r)] 0 (3.6) 
From (3.5) we obtain the inequality 


2 oa 
O<SE (xp) M (x4) fre (4) <a cee 


Using the formulas derived in deducing (2.7), we can write (377) 


as 


eee 9 a 
[Lem YP+LD (V—O &)) yal <2 —Eoaen), 


Letting k tend to infinity and noting (3.6), we obtain 


lim | fx (Xs) + OE (%) Ye | = lim | D(V—® (%,)) Yel =, (3.8) 


i.e., at each limit) point of thé sequence {x, } the stationary 
conditions (2.8) are satisfied. Since the stationary points are 
isolated, the limits exist: 


x=limx,, y=limyl, O/=lim@/(x,). 
k-> co k= 00 


kc 
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PrOmGsronl ae eabsOnrtollovse that 
k 
D! (Xp 41) <D/ (xp) I (1—ayf). 
If tJ = 0 then the infinite product 
II (1—ayl) 
s=0 


must be zero. For this it is necessary (see Fikhtengol'ts [1]) 


that 


>» In[{1—ey!] = —0o, 
s=0 


BUtEth cw ics possible ont mkt yous O.f “By (8e8y for J <0 one 
necessarily has x? = 0.*° Thus, the limit point of the sequence 


{x 


eevee is a Kuhn-Tucker point. The estimate of the convergence 


rate of (3.2) is obtained like (2.18). /// "i 


4, THE CONDITIONAL GRADIENT METHOD 


1, GENERAL DESCRIPTION OF THE METHOD 
We will consider the problem of minimizing a differentiable func- 
tion on a convex, compact set: 


X,= Arg mi 
ay (x), et) ce 


Let the point cae LES X be known. Then the differentiability of 
f(x) yields the representation 

Af =f (x)—f (*e) =< (X,)s x—X,>+] + —%, |B (Me, X—X,), 
where 


lim B(x,, x—-x,) =0. 
X—>Xk 
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We will determine the vector Xe yielding the minimum of the 


linear part of the increment Af: 


%,€ Argmin <f, (x), *—X,>= W (%,). (4.2) 


Since the set X is closed and bounded, it follows that the point- 


set mapping W is defined for all xy = FW lb 1a 
W (x,)cX, R, =e <Tg (Xz), Xp—X,> = 0. 


If ee = 0 then X, X, € W(x, ) Aid) artes = x, we have the 


necessary (and sufficient for convex f) condition of the minimum 
of a function .on* X (see Theorems, 1.4.1 and 1.4.3), and the com- 
putation stops. We now consider the case where Ry < 0. As a new 
(oOalialas Xp44 we take 

Seog ee = O (X, — X,) t (4 333 
Here O = eo = 1. By the convexity of X, . x e xX: 


k k+1 


Tt W(x, ) consists of a single point Xe then the vector 


ag es 


ct <a) eel Xe It is a vector to which one can move from x, re- 


is called the conditional antigradient of the function 


maining on X and obtain the greatest projection onto the direc- 
THON Ot sthe antigradient of the objective function f(x). 

This is just the direction in which we move from x, in this me- 
thod, known as the method of the conditional gradient. 

Several versions of this method are possible. The most pre- 
valent is to determine the size of the step hy. from the condi- 
tion of minimizing the value of f(x) on the straight-line seg- 
ment joining x, and Xi! 


a,=Arg min f (x_,-+o (x¥,—%,)). 
0<a<l 
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Obviously, in this case the sequence £(x,) does not increase 


since 


F (xp) > F (Xp te (Xe — 44) =F (Xp): 


This first version of the method’ was suggested by Frank and 
Wolfe. Some authors call it the method of linearization or linear 
approximation since the basic computational difficulties arise in 
solving the auxiliary problem (4.2) of finding the minimum of the 
linearized function on the set X. This problem is not much sim- 
pler than the initial problem, and only in particular cases when 
the set xX has a Simple structure is its use recommended. For 
example, if xX is defined by linear constraints of equality- 


and/or inequality-type, then finding x reduces to an easily 


k 


solved problem of linear programming. f. 


The method can be somewhat simplified by approximately solving 
the problem (4.4). In [1], Pshenichnyj and Danilin suggest, ORs 
-i 
O 


example, to take a = a where 16 is the first index 


(i) ="0,1,2,2.:); forewhich the inequality 


F (Xp 27! (Xg—%p)) —F (Xe) S 2-9-4 <P (Xe) Xe—%Xe a>) 
is satisfied. If f is twice differentiable on X, then for de- 
termining the direction X, - X, one can use a more accurate ap- 


proximation of the function 


FS) —F (xa) © <P (Xn) ¥—XQ> hy HH Fane (%e) (Hp): 


AS x, k 


sion on the right-hand side. After this, the new point is found 


-x we take the vector yielding the minimum of the expres- 


by the formula (4.3); the size of the step Oy. is determined from 


(4.4) or from (4.5). 
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2. PROOF OF CONVERGENCE 


THEOREM 4.5.1. Let X be a compact convex set in Bae let the 

function f be differentiable on X and its gradient satisfy a 
e 

Lipschitz*+ condition with constant >. %. 0's Then for any XQ € X 


@e1. the sequence {x,} defined from the conditions (4.2) - 


k 
(4.4) is such that 


lim <P (%2), Xp—X_> =0; (229 


e25 ifssmoreover ,. /f 2s convex, shen the se of limit points 
of the sequence {x} is nonempty and belongs to the set X,, 
and we have the estimate 


O<f (x,)—f(.)<+, (aS) 


So 
k 
where C is a constant not. dependingyon kje x, = Xy- 


Proof. Let d denote the diameter of the set X: 
d= sup |x—yl. 
x, yEex 


According to the rule (4.4) of choosing the step a, for each 


OFS Tor = Ie the inequality 
F (p41) —F te) SF (Xp 7% (Xp —X~)) —F (Xp) 
is satisfied. By Theorem 2 in Appendix I the inequality 


F (X_p+& (X,—%4)) F(X) < 


a adi 4.8 
Safe (ty) He —¥> + — al ae 


holds. implying in turn, that. for any O sia <.1 /wethave the “esti- 


mate 


= l 
15,1 =1<f. (Xp), X—Xp| za? aes ’ 


A, =f (%4)—F (%e41). 
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The sequence {£(x,)} is nonincreasing and bounded from below 
(since all the x, belong to a eomsee set). Hence 12(x) 7 
converges as k > @ and Pee AL = 0. Passing ieoy wae) Latent alin qwlalte: 
preceding inequality as k +, we obtain the estimate 


0<fim |8,|<%a 
R-> 00 


holding’ for ‘any “O=] oe 1s Letting’ a tend to zero, we get the 
required property (4.6). 


1 fee mees econ Vex.) DNC Ors any mex cmos 


f(x) —F (X_) = <P (Xz), Ni Xp J 
@ min<f, (44), ¥—%4) = <fa En) %e—%a> = 8, 0 
xe 


Therefore, ar £(x,) & ((58)} x € X, whence we conclude that # 
eache limit point or (x, } belongs to X,. 

Note that during the calculations we can adjust the accuracy 
of solving the initial problem depending on a bound for the error 
ifs (x,) - f£(x,). Indeed, noting thes convexity Of ty fOr x SX 


and, Xx, = 4, we get 


0 < f (,) —f (Xs) Ge (Xp) X,—X> 
while 


Ki as Kp—Xer> =< Te (Xz); X—e—Xp>- 


Hence 


O<f (x,) —f &) <P), Xp—X,> = —8,>0. (4.9) 


This estimate can be used as a criterion for ending the computa- 


tions. 


(360) 5, RELAXATION METHODS 


To estimate the rate of convergence, we rewrite (4.8) as 


aL 
0 <f (Xe) —f (%e41) &%| 5, I-> da. 


The maximum of a on the fright side will be attained for 


= 7 eel 
a, = 1d? . 





AS k e<%, he + 0. Hence there is a N such that for all k>wN 


the quantity Oy. <1, and we have the estimate 


F (x) —F (teas) > gyal Oe. 
Using (4.9), we obtain 
F(a) —F (Xess) > appl Oe) fF (es) 


Setting a, = f(x) = 10x) A= sha”, we rewrite this inequal- 


it yen the stoum 


Ap— Any, 2 Adg. (4.10) 


hetvoral ay >) OFS a then sfiz-oms G40) ate Lolmowsethat 


1 L 2% Mest A ap 


= . 
Gr+3i1 Ap Apde +1 Gp+4 





Summing this inequality for k = 1 to s5 = 1, we obtain 


AE oLY. 


ag AQ 


: : 4 all 2 
This implies that a. < K(s-1) < aoe 


One can similarly prove convergence while adjusting the step 


hence (4.7) holds. /// 


using the formula (4.5) and choosing the direction from the quad- 


ratic approximation. 
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5. THE GRADIENT-PROJECTION METHOD 


1. THE IDEA OF THE METHOD 


Consider the problem (4.1) in which we assume f(x) to be convex 
7 

and differentiable and xX to be convex and compact. Then for 

each point x from the condition (1.1.2) one can define its pro- 


jection p(x) onto the set X. The gradient-projection method 


consists in constructing the sequence 


Xpa1 = P (Xp— Oph (x,)). (5.1) 


There are various ways of changing depending on this, 


Oy. 5 
one can derive various versions of the method. We examine the 
Simplest case. 

The auxiliary operation of projection onto the set X necege 
sary for implementing the method is, in general, of the same com- 
plexity as the initial problem (4.1). Hence the gradient-projec- 
tion method as well as the conditional-gradient method are useful 


when the set X has a simple structure convenient for solving the 


auxiliary problem (e.g., X is a multidimensional parallelepiped). 


2. PROOF OF CONVERGENCE 


GHMMAg oO mO Ml CheESe exec E” is closed and convex, then for 


n 


all @e-eE . x*¢ X, we have the inequality 

Kp (2) oie pl 2). (5.2) 
Proof. Consider the strictly convex differentiable function 
6(x) = |iz-x|{? ; Aetts pderivativelis \-ol0x) =-2(2-x). The mini- 


mum of (x) on X is attained at the single point p(z). Hence 


by Theorem 1.4.1 it is necessary that for any x « X the inequal- 
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ity (5.2) be satisfied. /// 

THEOREM 5.5.1. Let X be a convex, compact set; let the convex 
ditferentiable function f(G2) jbe such that ats gradient Savisties 
a Lipschitz condition on X* with constant &%. Then there exist 


Ey, £5 such that for any Oy. satisfying the condition 


2 
SSS Gora (553) 


the method (5.1) converges to X and for x, € X, we have the 


*? 
estimate (4.7). 


Proof. Using Theorem 2 in Appendix I, we obtain 
l 
An=F (Xp41) —F (Xe) < <Fc Xe) Xpa1— Xp +7 era — Xa 2. 


We transform this inequality to the form 


1 : 
A,X i Ol AX X pet Xp, Xpaa Xe 
l 1 
i ea 


Op 


Setting Z= x, = a, f(x); x= xX in? (5..2)%, we) have 


k 


1 l 
A,<=— (=—3) l+p41—*,|?- 


Adjusting a according to the rule (5.3), we have 





k 
] l (22, Al 
ae Bieta 
Coe4)) 
Ag — bs |] X41 —X4 [7 <0. 
Thus (5.1) is a relaxation method, and Kea = X, dette x, € X,- 
Let xX, © X,.. Then, using the formula, (1.2.14)... we obtain 


O<a,=f (X,) Ff (Xe) < <fie (Xp), X~p—%e>- 


We transform this inequality as follows: 
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Op Z<Pg (¥p)> Xp—Xpa > + 


1 
ae <Xp— Oph (Xn) —Xptis ¥e—Xpaa> — 
! 
mae <Xp—Xpaits Xa —Xp4.- 
Again using the formula (5.2) and assufning Zz = ee Of Cx), 


Xe we have for ay 


Ne 
24 < Xess — el (Ihe eM te — 2 eel) - 

Since X is bounded, so are ES Cage | and ||x,-x,|| for any k. 

Hence there exists such that a, < o || 244 7 %,ll- From (5.4) 


we obtain 


Ay = Ape 1— Ap K — || Xp 41— Xp. 


Thus 


are 


g 
Ay —Ops1 > 2a | Xpri—X, |? Ste. 


We have arrived at an inequality similar to (4.10), yielding in 


turn=the estimate C477). 9" / f/ 


Chapter 6 


NUMERICAL METHODS 
FOR SOLVING 
OPTIMAL CONTROL PROBLEMS 


Intensive efforts to develop numerical methods for solving opti- 
mal control problems began in the late 1950's for two reasons: 
first, at that time high-speed large-memory digital computers were 
becoming available, opening up the broadest possibilities for us- 
ing numerical methods and, second, the development of complex en- 
gineering systems in rocket and aircraft design, for example, 
created the need to solve a great multitude of optimal control 
problems. 

Numerical methods draw essentially on the basic results in 
the general theory of optimal control, ~A sreat step forward an 
this direction was the “maximum principle” of L.S. Pontryagin -= 
a canonical formulation of necessary conditions for optimality -- 
providing the basis for development and growth of a new direction 
in variational calculus and setting the stage for further diverse, 
prolific studies. 

One can distinguish several directions in the development of 
numerical methods for solving optimal control problems substan- 
tially different from each other. First of all are the primal 


methods based on descent in control space. Then there are methods 
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based on Pontryagin's Maximum Principle, changing the initial pro- 
blem to a "two-point boundary value" problem. Another approach, 
Moiseev [2], is based on variations in state space. Yet another 
direction, developed by Fedorenko [1], involves concepts of the 
linearization method. The conditional-gradient method and the 
gradient-projection method have been extended to optimal control 
problems by Dem'yanov and Rubinov [1]. Much effort has also been 
expended on numerical methods based on Bellman's dynamic program- 
ming (Bellman [1]). 

In one of the early monographs on numerical methods of optimal 
control (Moiseev [2]), a casual mention is made of the possibility 
of using nonlinear programming methods. This was followed by, and 
found extensive development, in particular, in Polyak [1], Ermol'ev, 
Gulenko, and Tsarenko [1], Propoj [2], and Tabak and Kuo [1]. 

In this book, methods for solving optimal control problems 
are presented based on the concepts of nonlinear programming theory. 
This approach turned out to be extraordinarily efficient for many 
reasons: many earlier heuristic algorithms are now well understood, 
the possibility of generalizing them has emerged; it allowed the 
use of an enormous sophisticated arsenal of nonlinear programming 
methods and of unconstrained minimization methods, and, further- 
more, laid the foundation for developing methods of system optimi- 
zation with high accuracy; the nonlinear programming methods help 
solve complex problems of optimal control, including those with 
mixed constraints. 

The basic computational formulas to implement the nonlinear 


programming approach can be found, for example, in Polyak [1]: 
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however, Polyak's derivation of formulas for the first derivatives 
cannot be used for more refined integration routines than Euler's 
method, nor for the second derivatives of the objective function- 
al. This circumstance leds. to improve the methodology of deriv- 
ing formulas. In Section 6.1, the technique for computing deri- 
vatives for systems integrated by Euler's method is illustrated 

by an example. Then similar formulas for systems integrated by 
the Runge-Kutta method are developed, as well as formulas for com- 
puting the second derivatives of an objective function. We can 
next carry the sufficient conditions for a minimum, so elaborate 
in nonlinear programming theory, over to discrete-time processes 
approximating the initial optimal control problems (see Section 
6.2). In Section 6.3, we show how the nonlinear programming meth- 
ods apply to control problems. It is possible to have the struc- 
tural continuity of all the methods: a change of the methods of 
integrating the initial systems of differential equations leads 
only to an algorithmic change of individual blocks for calculating 
the objective function, and the functions determining the con- 
straints and their derivatives. 

In most of the methods of Section 6.3, one, constructs: a 
sequence of unconstrained minimization problems with changing 
auxiliary objective functions of many variables. The local meth- 
ods used to solve the unconstrained minimization problems yield 
local solutions of optimal control problems. The auxiliary pro- 
blems of unconstrained minimization have unique features of inter- 
est of their own, and exploit special techniques based on the 


discrete maximum principle. 
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The discrete een principle has been proved by Propoj [1], 
Pearson [1], Halkin [1], and othérs. Among the many Soviet stu- 
dies of this topic we mention Boltyanskij [1], Gabasov [1], and 
Yakovlev [1]. In this chapter, for reasons explained in Section 
1.8, we call the "discrete maximum Piluciniel the "discrete mini- 
mum principle" and give necessary and sufficient conditions for it 
in Section 6.4. Using these results, in Section 6.5 we show how 
one can use the conditional-gradient method and the gradient- 
projection method in order to solve auxiliary unconstrained mini- 
mization problems, solving, as well, optimal control problems with 
mixed constraints. In Section 6.6, these numerical methods are 
generalized to problems involving control parameters, delays, dis- 
continuous right sides, and the minimal-time problem with simple 
nondifferential functions. In Section 6.7, some test problems ace 
solved. The application to game problems is illustrated in Sec- 
ieVOnMmOr.c.. 

The material of this chapter is a survey of the results ob- 
tained during recent years at the Computing Center of the USSR 
Academy of Sciences and is published partially in Grachev and 


Evtushenko [5], [6], and Evtushenko [12]. 


1. BASIC COMPUTATIONAL FORMULAS 
1. COMPUTATION OF THE FIRST DERIVATIVES FOR THE EULER SCHEME 


In Section 1.8, the necessary conditions for a minimum for optimal 
control problems were given. In particular, the processes exam- 
ined are described by the non-autonomous system of ordinary dif- 


ferential equations 
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HF (x(t), u(t), ), O<1<T, x) =m, Cieie 


where x(t)<©E", u(t)<E’. 

In constructing numerieal methods for solving optimal control 
problems one usually proceeds from "continuous" systems (1.1) to 
their discrete approximations. To simplify our discussion, we 
consider first the case where the system (1.1) is integrated by 
the Euler scheme. 

We decompose [0,T] into q- 1 intervals by the points 
0 = ty < ty < eee ve = T. We call [t,,t,4] the qth interval 
of integration and denote its length by h; = t. = tj. Further- 


more, we Set 


i-1 q-1 


Lei lug; t= » h,=T, x, =x (t;), u;==u(t;), 


s=] 


z,;=[*;, Uu;, t;], Z4=[%) Ug, tals F(z) =F (x, Uu;, ti). 


In the remainder of this chapter, i takes on all possible 
integer values in the interval [1:q-1]. Integrating the system 


(1.1) by the Euler scheme, we have 
X41 =X, +A,f (2,). 


Ihe. shes convenient to write this system in the form 
Xis1=F (z,),  F (2;) =x, +hjf (z;). (12) 


When we considered the "continuous" system (1.1), the control u(t) 
was a vector function. For a given discrete approximation the 
control is a finite-dimensional vector w= [uy,++-,U J « gd 

q 


which we call the complete control vector. 
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Substituting the components of the vector w into the system (1.2) 


(1.2), we determine sequentially the components of the vector 


K=[%, Xa -0 +, Xgl EE 


Di 
and call this vector the complete state vector. We next consider 


the function of the two complete vectors x and w: 


q-1 
R (x, ©) =b(2,)-+ & hB (2). ia 
Given the vector w, with the aid of (1.2) we determine uniquely 
the vector x and denote this dependence by writing: x = x(w), 
R = R(x(w), w). The objective of this section is to derive for- 
mulas for computing the first and second derivatives of a compo- 
site function with respect to the components of the vector w. We 
need these formulas to implement various numerical methods of ap- 
proximate solution of optimal control problems. In deriving the 
formulas, no constraints will be imposed on the complete control 
vector ws; and it may be unfeasible and nonoptimal. Accounting 
for constraints and numerical methods will be described in Section 
Ofrale 


Introduce the auxiliary n-dimensional vector 


_aR (x, ») 
ra (1.4) 


Let us explain the meaning of a derivative. Let A denote 
the n-dimensional vector of increments. We introduce a new com- 


plete state vector 


x=[%1, Xoy sees Xpaiy Mir Neary vee “JE E4, 
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whose first i-1 components correspond to the components of the 


vector —s* Xs = x, +4, all subsequent components are obtained 
from the recurrence relation (1.2) if we take Xj as Xx; and 

e 
then increase i to the value i=q-1. The vector Pj: called 


the derivative of R with respect to Xi at. the point  [x,w), 


is defined by 
lim — [R (x, w)—R (x, w)—<p;, Ad] =0. 


It should be emphasized that in computing Dj» the complete con- 
troOLIvector  w “Tremains= constant. lf thestunctions, Ry Eo sare 
continuously differentiable with respect to the components of the 
vector Xj» then the vector Pj Cxists,) Hor 19=—q)) thes vector 
is computed simply: 


Pi 


__ dR (x, w) __OR (x, w) ieee (Zq) 


Raa di, Oxg a? oo 





Since by the recurrence relations (1.2) none of the vectors 


Ky, Xgreess et depends on Xa: Let 


q-\ q-1 
R(x, w) =b (24)-+ 2 h,B(2) = (2,)+ & C(z,), een 


A(2;, Pras) =C (2) +<F (2), Pravd- 


Noting the formula (1.2) expressing the dependence of x on 


aletall 
the "preceding" vector X;, one can write out a more detailed 


formula for computing P;: 


OR | OxXi41 dR 


Pi ~~ Ox; Ox; dxi41 








By the formulas (1.2) - (1.6) this expression can be made concise: 


pr =C,, (2,)+F,,(2)) Pray = Ay (2), Dyas): Cia) 
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In what follows the subscripts x and u_ denote the partial de- 
rivatives with respect to explicit components of the vectors x 


and u, respectively. In particular, 








Halzi Piast) =A i 


A ailers Pi+i) =A; 





Pitis 


2B Be “ 


Pes 


For the derivative of R at the point [x,w] with respect 
th ng 


On thes oi component wa oe E of the vector w we write 
__ GR (x (w), w) 
Yi oh du; ‘ 


Thewmeaning otf this formuila!is distineti from the definition of 
the vector Pi3 here we mean the differentiation of R as of a com- 


DOcmtenruUncuLon es We MORE pReciLoeh yw LetmunemecOMnplLevenyeCitOmns amex 


2 
and w be given. Introduce two new auxiliary complete vectors 
Mis hi vee etn AT, tei eee ene ass 
Wea [YU ny ee RMT ee, fe, Su, | CLE e 


All components of W coincide with the corresponding components 
of w except the on component equal to uma, where A « Et 
is the inerement, vector, ihe first, 42° scomponents of athe vector 
% coincide with the first i components of the vector x. The 
complete vector xX obtains from (1.2) if one takes the vector W 
aS a complete control vector. We define the vector Ve from the 


condition 


lim [R(x, w)—R(x, w)—<y,, Ad] =0 
All + 0 Tat 


Using the rule for*differentiating composite functions, we obtain 
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dR ; 
Peas (2) +F, (2) Piss = Ae), Pi+)s 
_ dR dR __ 9b (24) 
44 ~ “dig. = Og. , role 
e 
If in the formulas obtained, {all the hy tend to zero and 


(@lietey) 


the number of steps q tends to infinity, then the difference 
equation (1.7) becomes the following ordinary differential equa- 


tO 


fe =—lhe MO, 2). Dp(+B,6(), 4, 0 2.9) 


which in Mayer's problem (for B = 0) coincides with equation 
(1.8.3) of Chapter 1, which describes the change of impulses (ad- 
joint multiplier) in optimal control theory. Hence in the sequel 
we will call the vector Pp; an impulse. 

Given the complete control vector W> oS -uSsing (152) one can 
determine in sequence the complete state vector x and compute 


the value of \R. For a continuous system, this means integrating 


the system (1.1) "from left to right." Nexty usa nee (loa 7) 
we compute the sequence 2. cas, in optimal control this is called 
"integrating the impulse equations." After this we determine the 


derivatives by the formula (1.8). 

It is worthwhile to compare the formulas (1.4) and (1.7) for 
the same vector Pj: One can use (1.4) for numerical computations 
without introducing the impulse equations (1.7). Fixings wand 
giving x increments, we integrate (1.2) from the 2 step to 


the ae calculate the changes of R, and approximately find 


the vector Dj: These computations are more cumbersome than those 
used in (1.7) since calculating only once "from Lireh oe tombe t yuan 


Cie) one mean immediately obtain all the vectors D;- Neverthe- 
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less, it is convenient to use (1.4) for adjusting the programs 
since it helps verify the correctness of the programmed formulas 
(Gieaa)e 

In this section, using the finite-difference approximation 
(1.2) of the system of differential equations (1.1) we derive re- 
currence relations for computing the impulses (1.7), after which 
by passing to the limit we find the differential equation (Gabe, Die 
One might try to argue backward, starting from the "continuous" 
systems (1.1) and (1.9) and going tothe discrete ones Gla eee 
(1i7) —— but 2 tew difficulties arise on the way Indeed,” formal 


integration of the system (1.9) with the same error of order h? 


i 
leads to different results. One can write, for example, the fol- 
lowing two formulas equivalent in the sense of accuracy of inte- > 


eraia nce Cuoems)e: 


Degg Da Pal 23 Pal (1 74) 
However, only the latter is equivalent to (1.7). At the same Wwalinley, 


using (1.7) and also (1.8), one can obtain an exact formula for 
computing the derivative of R. Hence it is preferable to use the 
Formula (sss) sraciner = than (1.10), although the difference between 
them is small, that is, of the same order as the error of integra- 
tion. In schemes of integration of the system (Glnel® Omen welt ekts 
order of accuracy this difference becomes more crucial. Hence in- 
stead of formal integration of the differential impulse equation, 
exact formulas need to be used. Formulas for computing the deri- 
vatives of R are usually employed to implement various numerical 


methods of unconstrained minimization. The differentiation errors 
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may Substantially complicate the computations. This is especially 
serious if refined minimization methods are used, e.g., the con- 
jJugate-gradient method. 

It is often imperative to study systems of the form (1.2) in 
solving optimization problems for systems the behavior of which is 
described by recurrence relations only; their "continuous" nota- 
tion (1.1) is not quite sufficient, the coefficients h; need 
not be small quantities. In analyzing such systems, exact differ- 
entiation formulas are particularly appropriate. 

In making discrete approximations of the initial system (ig al)y 
one can adopt special hypotheses concerning the behavior of the 
control within the integration interval. In (1.2) it is assumed 


that the control is constant within the integration interval. One 


can postulate, for example, that within each oe integration 
interval, the control is a linear function of ites OIC 
< Se the i 
ty ty ts a h; we have 
t - tj 
u(t) = u; a “badeuey ts - u;) ; 


For the Euler scheme, the computational formulas do not change, 


but for more exact integration schemes new terms appear. 


2, COMPUTATION OF THE SECOND DERIVATIVES 


Suppose that the functions R(x,w) and F(x; ,u,;,t,) are twice 
continuously differentiable with respect to the components of the 


vectors x and w. Let us find the matrix of the second deriva-— 


tives of the composite function R(x(w), w) with respect to w. 


Introduce the square symmetric matrices 


(375) 6,1. BASIC COMPUTATIONAL FORMULAS 





pees (@) pp _ @R(x, w) _ 0°6 (24) 





dxidx * qa dXg dxg yy x4 
dy; _ aR dyq _9%b (2g) 
du;  du,du;’ iigas aur, 


The interpretation of these formtilas is analogous to the 
definitions of the first derivatives of R with respect to the 
components of the vectors x and w. In differentiating with 
respect to the components of the vectors x the control vector 
w is assumed to be constant. 

Differentiating Pi and va with respect to x and us. 


z 


sialel Tavensslpayes (Cab, 7), Wes lloastlabat 


dp; d 
p= fh = RH (2,, Piar) +F (2; ) PiasPE (2i)s 


dp; 
Fo RT Hex (2:5 Proi) + Fy (2:) Pisa s (2:), Ce sp 


d? » 
ao ta (2;, Cpe (2;) Pi+rP i (2;) = 


If 1<s<i<q-1, then we easily derive the following formulas: 


d*7R — AxX54, + d?R ea aR 
u 


dusdu; OUs x54 du; dX 541 du;’ 
d*R OX s41 d*R PZ, d*R 
dx du; OX ey at; PES TAG rele 
Samana var bOrmmCs= Seon awe ila. ve 

d?R d?R FT 

—S|= ee —————————————_ zZ 

du,du; dug dx; 44 (21), 

d?R d?R 

x (2). 


dusdx; dus dxj41 
From these formulas one can determine sequentially the elements OL 
the matrix a“R(x(w),w)/dw-. We give only part of this matrix,- 
writing the block matrices occupying the block columns with sub- 
scripts from s- 1 to s+ 41 and block rows with subscripts 


POC a OMS cts lcs 
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| sl | s | s+1 



















d?R 


Augean 


d2R d2R 
s 


Ea eas) aegis ee eset ead ee 








s—]1 
| 
| 








ORT d?R d*R 
tide Ceo | du, dil, Eu 9) Tee due,. 
d?R fe dR Te d*R 
ead diis4 aX s uel? se} dus 41 anges (2s) dts 4 dus 44 















Thus, in computing the second derivatives it is necessary "to 
integrate from right: to: left'’ not only the equation (1/7) for the 
impulse vectors P, but also to recalculate the impulse matrices 
Py and the matrices d°R/dx dx, . Cases are possible where for 
some i the vector bu = 0, which corresponds to "singular" 
regimes in optimal control. Obviously, these formulas still hold 
and are useful in theoretical studies of these cases. 

Although these formulas are cumbersome, they are expected to 
be of considerable importance, since they open broad possibilities 
of using the approach described in Chapter 4 for constructing 
rapidly convergent computational procedures analogous to Newton's 
method. Some examples of such computations are given in Section 


6.3.0 


3. THE RUNGE-KUTTA SCHEMES 


For a number of practical problems, it is required to guarantee 
high accuracy in integrating the initial SiSteem qabowys Wes Saline 
plest way to meet this condition is to reduce the integration step 
in size. In that case, however, the dimension of the control vec-— 


tor w increases making the optimization process much more compli- 
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cated. Another way of improving the integration accuracy is to 
use more accurate integration formulas. We shall consider the 
use of Runge-Kutta schemes. As before, the problem will be to 
generate formulas for doipating the derivatives of the composite 


function R(x(w),w) with respect to the components of the vector 


W. 
We integrate the system (1.1), using 
W 3 
X41 =X, +4; ae (zt), (1.13) 
J= 
where 
2 aa dtioat wn w(t), 
and 
‘ ol ; , 
xi =x; +B,-14;f (zi*), t= t; +By-ah,, (1.14) 
where 85) a is a set of numbers, all 0O aAET Seed and By = 9, 


hence the values f(z) are anessentrals” @int @ist3) , sGi4)wanid 
1latereim thas) section the indices i) and] 3. wvake on=integer va-— 
lues sin*the intervals ‘v[1ig<=1] and’ [lio], ‘respectively: 

To different (parameters ) g; and Bee there correspond 
different integration schemes. Hence the formula (1.13) determines 
a set of methods of numerical integration of the system (1.1) us- 


ually called the family of Runge-Kutta methods. The error of inte- 
th 


gration of the system (1.1) on the i step is estimated by the 
difference n(h, ) = X(ti 44) - Xq41> where x(t) is the solution of 
Glee evar aan ne ee COme i Om x(t; ) = The quantity n isa 

function of the integration step h;. Wie we) ay ey, Shblbacalealetingy May 


smooth, functionsior its arguménts;.then the function n(h; ) is re- 
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presentable by a Taylor series; 
Ss 
n® (0) nitty (Ohi) 1p yp 
mh) = Yea (hat + yr es 
Where mnOmarO mi el The parameters of the integration method By) 


eaet are such that 


CO oes ign! (0) eater =) (heen FC 


for arbitrary sufficiently smooth functions f(z). Here if 
n68tty 00) #0, then s_ is called the order of the error of the 
integration method on one step. 

A detailed analysis of the various schemes for integrating 
ordinary differential equations may be found in many texts on num- 
erical methods. Here we limit ourselves to few sets of possible 
parameters from the family of Runge-Kutta methods. 

If we set p=1 in (1.13), we obtain the Euler scheme con- 
sidered above, which has a first-order error of integration. 

If we set p= 2, Epe= OF So = E, o8 = 3,0 we obtainithe 
so-called Euler scheme with recalculation, with second-order error 
of integration; and the computations are made by the formulas: 

X41 =; Th, f (2), 2i=[xi, ui, 4, 


2 ‘ h; 1 
ap xi ef (2%); 227, pt hy. —) 


A modified Euler scheme in which 0 =2, S14 =85=2, By = lea lso 


has second-order error 
Kiar =x, + Uf (D+F (AD), 
=x, Max,-+hif (2), 
=E;, tj=t;-+-h;. 
Among the schemes having fourth-order error, the most common is 


one with 
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p=4, (BS 1/6, £2=—23 = 173, 
6, =B8, = 1/2, By =U, pb, ==: 
Let us go over to. the derivatives of the composite function R. 


By the formula (1.13) we write the expression for R in the form 
q-\ 0 ; 
R (x, w) =b (Xq, Ug, ta) i A h, x g,B (24), 


where x is a complete state vector and w is a complete control 


VEC Lore 
— 1 1 
x =[%, XX}, ees, a Xo, Xa, cory xe ies D Xals 


5 i 1 
We eee etait olism Pele et cpg |, 


Introduce the auxiliary n-dimensional vectors 


_ aR _ 06 (24) _dR(x,w) yg _ AR 9) our 
’ =: dre ° , 


Pa dx Ox ei BE dx} 





q 
Noting the relations :(1.13), (1.14) %determining the vectors Xea4> 


see as differentiable functions of the "preceding" vectors 


xh x, we obtain 
a 1 
OX;41 dR 

Ox; fy OX; dx} Ox; dxj41- 

op OR Sr oer AR Otis ORs 
Ox; AX; +1 


ply Chee Re a reece net. 








Noting (1.13). (1.14), (1.16) we rewrite these expressions in the 


form of recurrence relations 


Z Diz) 
x =) q 
Dp = p at F Pp. ’ p -< ? 
at tale 4 al q OX, 
e _ e e et+1 
peetcenh; le.8 562.) + £02; ) [e,Pi44 +8 QPy ek (Lea) 
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ES p 0p sta 
Dy = Ae [B, (2, ) Pe ey , Ce [1 :p 1] 


To simplify the formulas, it is convenient to add one addi- 


tional coefficient oe to the coefficients Bae assuming ome: 
In this case, the vector Se will be multiplied everywhere by 
Pie hence its value is not essential. Define the function 


HZ» Dias, p,*)=h,[g,B (2/) +-<f (z/), 8 PisitBypi**>]. 


We rewrite the relations (1:17) compactly: 


a ett j+1 

De = HL(2;, Pi4y? Ds ) ° 
We derive similarly the next, basic formula for computing 

the components of the vector of composite derivatives of R with 


EESVeChE tO we: 


dR(x(w), w) OR , dx{*? eee ee 
dul dul Gulietdsl*? 4 Won! pave C1 ps) 


alia, Pit. ge). 

These formulas do not differ by much from those found for the 
Euler scheme. The recalculation of the "impulses" needs more com- 
plex formulas. 

From the general relations one can easily obtain formulas for 
computing the derivatives for each particular integrating scheme. 
For the Euler scheme, with recalculation (1.15), we have, for 


example, 


1 2 
=> hf, (2;) Pi, pi=h, [B,, (27) +h. (zi) Pi+rl; 


eras hifa (ei) ph “8 =A, [B, (2H) +Fa (Zt) Pras 


duy du? 


pe Pit + pi-tp?. 





Compared to the Euler scheme, the number of points at which 


the control vector is sought has doubled. One can set 
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ULpteU,, ae everywhere and note that in using the Euler scheme 
al a 

With recalculation, quantities.of order ne have been discarded 

on each integration step. We determine the vectors p. and the 


derivatives of R with the same error. We thus obtain 


pookiei ly Pate 2) Diar + By (zi), 
Por Bh Por FB 


ic 5 hy [Fa (23) ) Piar +B, (23). 


Here all the partial derivatives t £2 BL Bo have to be com- 


puted with error of order he in order to ensure error of order 


hn? in determining dR/du;. 


For the Runge-Kutta scheme (1.13), when the control is con- 


stant on the integration interval, we have 
4 fe 
Pi=Pisit SMe (2h, Di Boake (27) Ma (2) + 
s= Siz 


+ hi 2 pee Oa (Zi) bg (zi) M,, (C43) He 
_THEBBsBal (21) Fe (2D Fe (2b) Ma (22), 
a -> M, (23) +h; > Be fa (247) Mx (21) + 


s=2 


+h} ys Bb. ee) fe Me) + 
s= 3 


A3B,BaPof u (23) Fx (23) Fx (3) My (24), 
M (z3) = gh; [B (2) + <f (22), Pia]: 


One can similarly obtain formulas for other integration schemes. 
Using integration schemes of the jxoide (abn ik}) tiem C2 2p Ome 
presumes that the function f(x,u,t) has bounded derivatives in 
all the arguments on each integration in evict ee haa Da nee 
integration interval the control u(t) changes sharply (by quan- 


tities of order 1), the accuracy of the computations deteriorates. 
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Hence, in finding the optimal control, one needs to check the 
position of grid points with respect to t and, when needed, 
either change their positions or limit the size of variation of 
the control on each fateeration interval. The simplest of all 
things to do is to assume that the control is constant within each 
interval. This can be -done in solving many practical problems, 
when the system (1.1) has to be integrated with high accuracy; 
whereas the optimal control need be computed only coarsely, 
because usually the optimal control can be implemented only ap- 
proximately and therefore the sampling interval for the control 
need not be too small. 


It the control is fixed on the integration interval, then 


Us u; =... = ie and the vector w may be viewed as the set 
Ug 2 ee - For computing the derivatives with respect to u; 


one needs the following formula instead of GL ata 


0 
GR (x (eo) Ne) Date Zim Creams ye 
bed 


du; 


This approach may be carried further, assuming the control is con- 
Stant on several integration steps. This enables us to lower the 
dimension of the vector w, lowering at the same time the accura- 
cy of solving the optimization problem. In DEaciraucelialiy wa dei 
assume that the control is constant everywhere, then 


Ui eile a eeradee tl =u and 


g— Aso) 


dR (x (w), 
Say gmp Ps LENE ene OR TREN 


belay ant 


Actually, in this case the vector wu becomes a control parameter, 
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rather than a control. Analogous formulas for the Euler scheme 
will be given in Section 6.6. 

Formulas for computing the derivatives of the function R 
with respect to the components OU Mem Vie CiLOn u; are needed in 
the sequel for minimizing R numerically with respect to w. 
Hence the accuracy of determining the derivatives ought to be co- 
ordinated with the accuracy of the minimization process. In par- 
ticular, in carrying out the coarse preliminary computations, in 
the formulas for determining the gradients and the impulses one 
can discard small terms proportional to high powers of hj; thus 
easing the computations. Other simplifying techniques are also 
feasible. For example, one can assume that the control u(t) is 
a specified function of t within the integration interval. It, 
is not hard to obtain formulas for differentiation in this case 


as well. Much more complex are the formulas for the second der- 


ivatives of R; they can be found in Grachev and Evtushenko [5]. 


2. NECESSARY AND SUFFICIENT CONDITIONS FOR A MINIMUM 
ioe THE STATEMENT, OR THE. PROBLEM 


We will consider the problem of optimal control with "mixed" con- 
straints on state and controls. For the sake of simplicity, we 

study the case where the system (1.1) is integrated by the Euler 
scheme (1.2) and control must satisfy the mixed constraints along 


the trajectory 


al! 2 
r (x,,u,,t,) =i) 1 (x,,u,;,t,) <0 Cael) 


and at the end of the trajectory 
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3 4 2a) 
Pegeususit. 8, Opieet Cxh yung taye< Of ( 
By (1.2), to each complete control vector 
w= [u,; Uns sees ud « BE”? there corresponds a unique complete 
e 


State vector x = [X1> Kor sees “sa € Eee, Hence we write 
xX = x(w). We write the conditions (2.1) and (2.2) in concise 


form: gix(w), w) = © , NCH wy cs ON ae (ONES) 


where the vector functions g and h are a union of constraints 
of equality- and/or inequality-type along as well as at the end 


of the trajectory, respectively: 


gw) seep Phe) a, ager (Zgeeper? (27) 
h(x, w)=[T*(z,), D7 (z,),...., T? (Zqg-1), P4(Z,)], 
z,=[X%,, ui, t;]. 


One can assume without loss of generality that these functions 
define mappings g: Eta 5 aos fe iak tes ee 

We say that the control vector w is feasible if the vectors 
w and x(w) are such that the conditions (2.3) are satisfied. 
The feasible set W of complete control vectors can be defined 


in standard fashion: 
W={wEE4: g(x(w), w) =0, h(x(w), w) <0} 


emphasizing that g and h are composite functions of w. The par- 
tial derivatives of g and h with respect to x5 and u; are sim- 


ply expressed in terms of the original functions: 





Og aT (2;) ay dh OT 3(z) any 
—a TE (23), = =™% (2;), 





Ox; ~ Ox; Ou; Ou; 

dg aT (24) dh a4 (z,.) 

— SS 3 a g == [4 

OX OXq ry (2,), Oug Oug Vy, (2,)- 


Moreover, in the sequel the new fact (not included in (2.3)) that 
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the functions defining the constraints for OS hs depend only on 


u ite. will be exploited to significantly simplify the 


agate tn? 
calculations. 

The problem of diteeete optimal control On lea) CON SS Sma 
finding the complete control vector w and the complete state 


vector x such that the conditions (2.1) and (272))) are satisiied 


and the objective function 
q-1 
R, (x, w) =0, (2,) + oe h;B, (2) (2.4) 
t= 


takes on the smallest possible value. 

PieavectOrerune ons 9! , | s,.h aude the tunculons bi> By will 
be referred to as the functions "determining" the discrete optimal 
control problem. We always assume that a solution of the problem 
exists. An analogous problem with state constraints along a traé 
jectory can be formulated as well for the system (1.1). Intuitive- 
ly one would expect that for a large class of systems (1.1) the 
solutions of both problems will be close if the integration steps 
for (1.1) are sufficiently small. We will not do a rigorous study 
of this property since it has already been done, for instance, by 
Ermol'ev, Gulenko, and Tsarenko [1], Fedorenko [1], Budak, Berko- 


vich, and Solov'eva [1], and by many others. 


2, NECESSARY AND SUFFICIENT CONDITIONS FOR A MINIMUM 


The discrete optimal control problem is a special nonlinear program- 
ming problem. Hence one can obtain extremality conditions as well 
as numerical methods by using the well-known results of nonlinear 


programming theory. Let us use the Lagrangian 
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~ fee 
R (x, w, u, v) =), (2) + & A,B, (2;)+ (2.69 
+<u, g(x, w)>+<0, h(x, w)> 


with the Lagrange multipliers wt e« Bo. Ve i 


This Lagrangian is of the same form as (1.3); hence if the 
functions defining the problem are differentiable with LESVEC Eto 
the components of the vectors x and w, then we can apply to (2.5) 
the differentiation formulas obtained in the preceding section. 
The necessary and sufficient conditions for minima in nonlinear 
programming problems given in Sections 1.6 and 1.7 carry over al- 
most verbatim to our problem here. Hence we limit ourselves oO 
recalling a few results only. 

Let there exist a complete control vector w,, the correspond- 
ing complete state vector xs x(w, ) and dual vectors Wes 
Van > Op such that for any Wy» u, v > 0 we have the saddle-point 
conditions: 


HCA Weel Y) = RK, Wola Vey = RCxCw), We Wy Vy) 


Then, by Theorem 1.6.1, the vector Wy Is a, Solution of the dis- 
crete optimal control problem. 

By Theorem 156 272i the discrete optimal control problem is 
a convex programming problem and Slater's or Karlin's constraint 
qualifications hold, then the Lagrangian R has saddle points. 

Bye theorem). 7 Os. 024. ii addition, the functions detaninomthe 
problem are differentiable with respect to the components of the 
vectors x and w and the Arrow-Hurwicz-Uzawa condi tion as sat— 


isfied, then in order that the vector W, be a solution of the 
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discrete optimal control problem, it is necessary that there exist 


Lagrange multipliers u Vy 2 0 ’such that the triplet 


*? 
[Wy Uyov¥, 1] isa Kuhn-Tucker DOUMt eee 


ay 


dR (X (Wy), Wey Uy Un) '- 
ey ee ae ad (As) Wy) a= 0; bas 


A(x (x), Wx) <0, vh/ (x (Ws), Ms) =0, je[lic]. 


Here the components of the derivative dR/dw are found by the for- 
mulas derived in the preceding section: 


AR (x (Wy), Wey Wer Un) . wns 
Tae * * =H, (27, Pitis u;, Vi), 
t 

* * * 
2; == |X uj, t.|, 


H (2j5 Pivry Wj 0;) = hy Bs (2) +<F (2;), Pia + 


<i DU(z)o-<Uy, 123), (2.7) 
U a , 
p= TR Ge Be te 8) H(z}, pins, hs 02), 


0b (27) ike ra * * 
Pye + ig, TED + <0, PH@)>, 





where the vectors u;» v,; are the components of the vectors UW and 


v; their dimensions coincide respectively with those of the vector 


functions and Te ot Ovo ele Schl ee TC re and Tt it@ne Gl Sep. 


Introduce the cone 


K (Ws, Us) = {@: wm? dg (x eh Wy) = 0) 


dhs (x (We), We) __ dhs (x (Wx), Wx) 


where 
jEOw, Us), s€o(w*)\6 (Ws, Us); 
O (We) ={RE[1zc]: h*(x (We), We) =O}, 
O(W., Ve) ={RE[l:c]: vk > 0, REo(w,)}. 


We use McCormick's Theorem 1.7.2. Suppose that the functions 


defining the problem are twice differentiable with respect to the 
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components of the vectors x and w, and there exist vectors 


Wy» X, = W(X,), Uy, Vy 2 0 such that the conditions (2.6) are 


*? 


Satisfied; the matrix of the second derivatives 


~ 


2 e 
dQ R(x(Wy), Wy, Uys Vy) 


ma 





defined by (1.12) is positive definite on the cone KCWy 9 Vy)» 


Then w is an isolated local solution of the discrete optimal 


* 
control problem. 

In these assertions, the first and second derivatives of the 
functions defining the problem are computed with the aid of auxil-— 
iary variables (impulses), which is somewhat unusual in nonlinear 
programming. However, this is only a technicality and has no 
effect on the matter. Various schemes of integrating (1.1) yield 
diverse formulas for computing the derivatives without changing 
the form of necessary and sufficient conditions for an extremum. 

A similar situation arises in describing and proving numerical 
methods of nonlinear programming, used to solve discrete optimal 
control problems. In the next section we shall dwell only briefly 


On basic numerical methods, without proving the convergence, Since 


they have been extensively treated in the preceding chapters. 


3. NUMERICAL METHODS BASED ON THE REDUCTION TO 

NONLINEAR PROGRAMMING PROBLEMS 
The discrete optimal control problem stated in Section 6.2 is a 
particular case of a nonlinear programming problem. It involves 
relatively simple computations of the derivatives of the function 


R with respect to the components of w. This property suggests 
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many ways of effectively using the nonlinear programming methods, 
based on the computation of first derivatives. 

Most of the methods follow a common pattern: given a vector 
w, the values of the objective function and of the constraints 
are computed and the auxiliary function R of the form (1.3) is 
minimized with respect to w by unconstrained minimization meth- 
ods. Numerical methods differ from each other with respect to 
the construction of the functions R and the rules for change 
during the iteration. The choice of method for integrating the 
system (1.1) affects only the formulas for computing R, g, h and 
their derivatives. In computer programs, it is convenient to con- 
struct the procedures for computing R, g, h and finding the 
derivatives as separate modules. As the schemes of integrating 
the system (1.1) change, only these modules change in the numeri- 
ical algorithms; other blocks of the program remain intact. The 
choice of integration scheme is divorced from that of method. 
Schemes of high-order accuracy are employed in integration, where- 
as relatively coarse methods are used in optimization, and vice 
versa. 

We shall describe several most commonly used methods for 
solving discrete optimal control problems. Numerical computations 
of test problems are given in Section 6.7. To simplify the refer- 
ences, each method is designated OPTS, S denoting the number of 
the computer program realizing the method. In solving complex op- 
timal control problems, several distinct optimization methods are 
usually used. Methods having a relatively large region of conver- 


gence are used first, followed by rapidly converging methods upon 
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narrowing to a sufficiently small neighborhood of the solution. 
According to this procedure, each operating method "prepares" 
necessary data for a subsequent method to operate. 

In each numerical method, an iterative process of solving the 
problem of Section 6.2 is used. The number of the iteration is 
labeled k. We show how the complete control vector w changes 
on the «rh iteration, and specify the rule for changing the vec- 
tor of dual variables {u,v} in some methods. 

OPT41. EXTERIOR PENALTY-FUNCTION METHOD (see Section 3.1). We 


compose the auxiliary function 


R(x, w, T)=R, (x, w)+ 


+1] Slee, ot Bode eye 


where the objective function Ri is given by (2.4) and the penalty 
function wy is defined by (3.1.6). 

For any monotonic increasing sequence To < Ty < To aes 
one constructs the sequence of vectors W(T))>, w(t,)> -e. defined 
by the approximate solution of the unconstrained minimization pro- 
blem 


W, =w(t,) € Arg min R (x(w), w, T,). (3.2) 


If the functions defining the problem are differentiable with 
respect to the components of the vector x, then one can approxim- 


ately determine the dual vectors 


eb = R (x (Wr), Wr, Tr) 


ai , j AR (x (we), Wes TH) | 
Og! 


U 
’ k oni 


useful for further computations via the methods based on modified 


Lagrangians. 
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If the conditions of Theorem 3.1.1 are satisfied, the method 
converges to a solution of the problem. The computing process 
depends to a great extent on the policy used to increase the pen- 
ality, coset iteient'1,¢ ton the unconstrained minimization method 


employed, on the accuracy of solving the auxiliary problems (3.2), 


and on many other factors. The accuracy of solving the problems 
(38.2) has different interpretations in unconstrained minimization 
methods. In particular, in methods involving derivatives of the 
objective function, the computations are interrupted upon finding 


the control w 


ie satisfying 


[See Wr, Tr) 
dw 





|<ee. 


Thus, the second simplified version of the penalty function meth 
od will be used (see Section 3.1). 

If the functions defining the problem are nondifferentiable 
Pipe nd wl cboeChEaS one r axillary es tune blOne hd b a smappro— 
priate to take a function depending on a nondifferentiable penal- 


ty (see Section 3.2). 


OPT45. THE FIRST VERSION OF OBJECTIVE-FUNCTION PARAMETRIZATION 


METHOD (see Section 3.3). We compose the auxiliary function 


R(x, @, n)=[Ri (x, w)—n]?+ 


e (Coins) 
ee [gi(x, w)]2+ x p(At (x, w)) , 
= j= 
where the function w is the same as in (3.1), mn is the lower 


bound of the optimal value RF of the objective function. Ac- 


cording to the results of Settion 3.2," the value of .can be 


obtained after at least one iteration by the method OPT41. 


(392) 6, NUMERICAL METHODS FOR OPTIMAL CONTROL PROBLEMS 


Let the control Wee and the value Ne be known, and let 
the new control Wh be determined from the solution of the 
auxiliary problem 

w, € Arg min R (x (w), Oey (3.4) 
w 


ihewecont ron! is taken as an initial approximation. Using 


Pica. 
(3.3.6), we set 


Neti = Ne + V R(X (Wz), We, Ne): 


The computing process stops if at least one of the following 


three conditions is violated: 


k= d, E [g! (x (Wg), @,)]? + = "p(A7 (x (Wy), We)) <e, 
“esi Ne <8 (1 +| nN, |). 


Here d is the prescribed maximal number of iterations and e¢ 
is the accuracy of solving the unconstrained minimization problem. 
The quantities d and e are assigned by the user. 

If the functions defining the problem are differentiable with 


respect to the components of the vector x, one can find the dual 


vectors 
pt me OR (ey), aie) 7 FOR x (en) weeny) 
a dg! : a Ons , 
A= 2[R, (x (W,), M,)—Ne]- C35) 


OPT46. THE SECOND VERSION OF THE OBJECTIVE-FUNCTION 
PARAMETRIZATION METHOD. The auxiliary function (3.3) is construct- 
ed, problem (3.4) is solved in the «th iteration, the parameter 


Nk is changed according to (3.3.10), which in this case has the 


form 
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R (x (Wz), Wr, Nr) } 


Nett = e+ Rip, 2) —Ne 


The dual vectors are defined by (3.5). 

The essential drawback of these two versions of the objective- 
function parametrization method is the requirement that the uncon- 
strained minimization be performed with high accuracy. Only in 


* 
this case will the necessary condition << Ry be satisfied. In 


Mis 
the second version, due to the fact that n changes more drastic-— 
ally, this requirement is really crucial. Hence, in solving the 
auxiliary unconstrained minimization problems, a higher accuracy 
for the computation is required than with the other method. 

The penalty function method and the objective-function para- 
metrization method are most useful when the initial approximation 
is known to be coarse. Usually, the computations start with these 
methods and the initial values of the dual variables are determin- 
ed. It is, however, difficult to solve the problem with high ac- 
curacy, uSing these methods. 

OPT53. THE METHOD OF MODIFIED LAGRANGIANS. The simple iteration 
method (4.3.21) is used. The modified Lagrangian has the form 


R(x, w, Hv) = Ry (x, w) + YY [ai + Fa! (x, w)| g(x, w) + 


+5, De +(e, oP 


In the oh iteration the following is performed: 


Wy Ee Arg min R (x (w), Ww, les Up)s 


pts = Up + te (Xp We), Ober = (Oh th? (xp, @)) as 
Xp — x (W,). 


(394) 6. NUMERICAL METHODS FOR OPTIMAL CONTROL PROBLEMS 


A drawback of this method is that R has no continuous sec- 
ond derivatives in w. This cuts down the number of unconstrained 
minimization methods usable for finding Wie In this respect, the 
next method is most advantageous: 

OPT55. THE FIRST VERSION OF THE SIMPLE ITERATION METHOD. The meth- 


od (4.3.20) is used. The modified Lagrangian has the form 
Rix, wv, u, v)= R(x, W) 
é e ; 
+O [#+Fe%, w]e +h sr (x, w)]*+ 
= 


t=1 


v/ 


1+ th’ (x, w)+[t h/(x, w)]?+[t A(x, w)]® if ASO, 


* | [l—th/(x, w)]7? it h’<0. 


The method consists in the following: 
w,€ Argmin R(x(w), W, Up, v;), 
w 


Ups =U, +1E (x (4), We), 
tha=4 [0 (Ho P+oh| Le eee, 
[1—hi]-? it nk <0. 
Here hy = h(x(w,), Wid 
To implement this and the preceding methods, it is necessary 

to know approximate values for the components of the dual vectors. 
These methods are especially efficient in computations in the 
neighborhood of the solution. As was shown in Chapter 4, they 
are in fact versions of the simple iteration method. A higher 
rate of convergence can be attained by using a modification of 
Newton's method. 


OPT8. NEWTON'S METHOD. The simplest version (4.1.11) is imple- 


mented. The modified Lagrangian has the form 
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R(x, w, u, v)=R, (x, w) + 
<e 2 u'gi (x, w) + 2 (v/)2h/ (x, w), 
t= J= 


The iterative process ihvolves solving the following system of 


ry 


linear equations 


d*R (Xp, , ie. U d fey a = 
a Re ee Oe) (Wy41—Wr) Sein» Oe) (Upges — Ug) + 
e ~ ~ 
wi ANY (Xes We) (Zia dR (Xp) Wp, Up, 0 
2% of Ste) (6f 0) = — SR es es ti Oe) (3.6) 
j=l 
( NOD) 
dg (Xp, Wp) |T ( 
l ge 2] (p41 —W,) = — 2 (Nar @,), 


p/n Ow, —we) + Ai (ip, 4) Or — 04) = 
=—vh (Xpr Wy), 
where je [1:c], x, = X(w,); and the matrix of the second deriva- 
tives of R with respect to w is defined from the recurrence for- 
mulas Gini?); In implementing this method, it is necessary to . 
store a symmetric matrix of dimension (rq), which limits the 
size of the discrete optimal control problem being solved. 

Some difficulties arise in solving the linear system (3.6) 
when the constraints (2.3) do not depend explicitly on the compo- 
nents of the vector w, since in this case the determinant of the 
system (3.6) may be zero. For example, if the inequality con- 
straints has the form h(x) < 0, then for the discrete approxi- 
mation (1.2) we have h(x,) < 0 on the first step. Therefore, 
the value h(x,) does not depend on the complete vector w, the 
gradient dh(x, )/dw = (05 eliayel aise h(x,) = 0 then one column in 
the matrix of the second derivatives of R with respect to 
[w,u,v] is equal to zero, which makes the matrix singular. In 
this case, the constraint h(x, ) <0 should be dropped and all 


the sequential constraints be represented as 
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D,=h(x,-bhjf (x;, uj, t,)), t€[l:q—2] Car) 


and then use the usual computational formulas. 
e 
OPT1. THE LINEARIZATION METHOD. In the «th iteration, the cost 


function (2.4) and the constraints are linearized: 


Rx (x (w), w) & Ri (x (@,), wy) + (AEA 20) | bw), 


g(x (w), w) & g(x (a), wy) + [EEG 28)" bw 
h(x (w), w) A (r (wy), @,) + pete yp bw, 


Where é6w =W-w and one formulates the following quadratic 


k? 
programming problem of finding the minimum of the function with 


respect to dw: 


(28 (x (Wy), wa) bw) +a <u, aos (3.8) 


dw 


satisfying the conditions 


g(x (wy), ) + [LECH Pe)" by — 0, 


3.9) 
A (x (ty), ty) + [AE #8)” by <0, 


dw 


where a is a positive coefficient. Upon finding the optimal 


value dw, we set: w eis + adw, while the step a is 


ker 


obtained by minimizing the nondifferentiable penalty tune Lon. 


P=R,(X (Wg4i), Wp41) + 
set | s |g! (« (Wes), @ealt 2 hi (X (War), 0) ’ 


where. ta is sufficiently -barze. 
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If ‘one sets” a = 0, then (3.8), (3.9) becomes a linear pro- 
gramming problem and in this case one should require in addition 
that the components of $w,. be bounded. One can use all the other 
versions of the method Necoribed in Seotion 3.5. A special ver-— 
sion of the linearization method oriented toward solving optimal 
control problems has been suggested by Fedorenko [1]. 

OPT7. MODIFICATION OF THE ARROW-HURWICZ METHOD. Following the 
arguments of Section 4.1, we form the simplest modification of 


the Lagrangian, setting 
c 
R(x, w, u, 0) = R(x, w)+<u, g(x, w)>+ D (W/)*A/ (x, w). 
j=l 
The method consists in constructing the sequence 


Wri a Wp = aD ’ £ 
Uges = Uy tf eg (x (Wg), Wz), 
Ubi, =U, + 2eavhh/ (x (w,), w,) , 
where the step a must be sufficiently small, e« is either equal 


to one or sufficiently small. The method has a rather low rate of 
convergence. 

This will end our description of the basic methods. In simi- 
lar fashion, numerous other algorithms of nonlinear programming 
can be carried over to solving discrete optimal control problems. 
As mentioned earlier, programs implementing these methods can be 
modified by changing individual blocks as as to make them suitable 
for computing diverse schemes of integrating the system (1.1.). 

In the programs OPT41, OPT45, OPT46, OPT53, OPTS55, OPT1, OPTS8, 
OPT7, the system (1.1) is integrated by the Euler scheme. In Sec- 


tion 6.7, we shall be referring to the programs OPT413, OPT553, 
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using the methods described in OPT41 and OPT55, respectively; but 
the Euler method with recalculation (1.15) will be used to inte-— 


grate the system (1.1). 


4, DISCRETE MINIMUM PRINCIPLES 


1, BASIC DEFINITIONS 


As seen from the numerical methods described in Section 6.3, in 
solving discrete optimal control problems one often needs to find 
the unconstrained minimum of the auxiliary function R(x(w),w). 
The vector w usually has large size, thus minimization of 
R(x(w),w) with respect to w is a rather complex problem. One 
can use an extensive library of standard programs implementing 
diverse methods of unconstrained minimization of multivariable 
functions. However, the problem we are considering is of peculiar 
nature, extrinsic in the general unconstrained minimization pro- 
blem, and pUcrei are it is not accounted for in these methods. The 
fact that the minimization problem is connected with the solution 
of a discrete optimal control problem makes it possible in a num- 
ber of cases to use special properties which, for continuous sys- 
tems, are embodied in Pontryagin's maximum principle. Their ana- 
logs in nonlinear programming are the results presented in Section 
ear The objective of this Section is to derive for the system 
(1.2) necessary conditions for a minimum of the function R that 
are analogous to Pontryagin's maximum principle. Numerical meth-— 
ods based on these results will be examined in the next section. 
We assume that the multistep process is described by the re- 


lation (1.2). The problem consists in finding the awe component 
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u; of the vector w, yielding the minimum of the composite func- 


tion 


R(x(w), w) = DCz Ms 7 en x ee 


over all possible values u; belonging to some compact set U. 
All the other components of w will be assumed to be fixed. The 


SUDSeCrIp lass cbowarbiltraryea nde Lesspitlanue dics se lam IW denote the 


* 


solution set of our problem: 


W, = Arg min b(z (4.1) 


) 
u,<U a 


We assume in the sequel that this set is nonempty. 
If the mapping F(z,) and the function bz) depend con- 
tinuously on all their arguments, then each vector u;¢ W Gibke A) 


yields the unique sequence x 3 Xa This operation 


pede yee is ° 


detinesman continuous sLune ta On mp OL eLhemViceLOr x and a map- 


aoa 


pine x asva Lunctaon ot tae vector u;- To simplify the for- 


ata 


mulas, we denote the vector x by the letter a and the com- 


dete 
posite, tunctien, bs of 2 by, BCa).. Then 


R(x(w), w) = Bla) = B(a(u,)), acu; ) = F(x;; U;> t;) : 


The set W is representable in the form 


* 


W,.—=Argmin B(a(u,)). 
u;eU 


Instead of minimizing B(a(u;)) with respect to the control vec- 
tor u;¢ U, we consider the problem of minimizing B(a) with 
respect to the state vector ae = a(U) (in the state space) 


and define the set 
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Q,=Argmin B(a). 


aeéQ 


Obviously, %, is the image of the set W, under the mapping 


a(u,): ee Ct) G 
lfethes vector tunccr1on Hand the, tune trons bm are dit teren— 
tiable with respect to the components of the complete state vec-— 


tor, then by the formulas of Section 6.1 we have 


d : 0b (2) 
p Hee Deas) i<S= ql, ar (4.2) 


Sarco 


With the usual notation H(2Z,,P.4,) = (ED oP apie 
Now for each vector u; U one can compute the vectors 


Xp4q? Xy4Q7 00> a anid. salso,, from (4. 2;) find Py? Dives foe Pi4d: 


This operation defines the single-valued mappings 
Pia. = Pi41 (2) = Dj 41 (2 (U,)). 


If the function B(a) is defined on an open set containing 
@ “and is ditferentiable at. the, point a <2, then .one can intro= 


duce the point-set mapping 
Wy (a) = Arg yee <Pi+1 (a), a—a> 
ae 


or, passing to the control space, define for u e€U the multi- 


valued mapping 


7 (u,) = Argmin <p;41(a(u)))» 4 (u;)—a (u)>. 


Let a=a(u,). Then the set W, (a) is the image of the set 


Wo Cu; ) under the mapping acu, ): 
Wi(a) = a(Wo(u, )) 


If we assume that the mapping F is differentiable with re- 
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spect to the components of the complete state vector as well as 
the components of the vector w, then we can use the formula (Gils, te) 
obtained in Section 6.1,.which can be rewritten in the form 

y 


dB (a (u; 
ST Fults Hy t) Pras (@(e))= 


=H, (x;, u;, ti, Pr+1(@(U;))) =A; Ff, (%;, Uu;, t;) Pja1 (@(U;)). 
For each vector U, ¢U we define the point-set mapping 


W, (u;) = Arg min : u;—Uy). 


= Se (a (u;)) = 
ue U d 


i 
Similarly to Section 1.4, one can pose theproblem of finding the 
fixed points of the multivalued mappings W,(a), Wo(u,; ), Wo(u,; ) 
or the problem of solving the corresponding variational inequali- 


t1es,si.e,., the, points satisfying .the conditions 


- 

ae W, (a), <Pi+1 (0), a—a>>0 Va € Q, C403) 
u,eW, (u;), <Pi+1(4(4;)), f(%;,°4;, t)— 

=f ish ue ps0 Vn ey (4.4) 


u;EW;(u,;),. Fak is Lis t;) Pia (a(U;)), u;—u;> >0 Vu; EU. (4.5) 


lial ws W,, then (4.4) is a discrete analog of Pontryagin's mini- 
mum principle (see Section 1.8). The condition (4.3) implies the 
same property, however, in the state space. The condition (4.5) 
is a discrete analog of the linearized minimum principle. 

We replace the initial problem (4.1) of computing the minimum 
of the composite function pee with respect to ujz« UP byethiant: 
of finding vectors para ay ie anyon the wwondi tions (473) — C475). 


First, however, we formulate conditions under which such a reduc- 


tion is appropriate. 
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2. NECESSARY AND SUFFICIENT CONDITIONS FOR A MINIMUM 


Let 
urEW., de=a(usyEQ,, u,EU, a=a(u,€EQ. 


Also, let 
A(a)=R(x(@), ®) —R(x(W.), .) = B(a)—B (a,), 
where all the components of the complete control vectors w and Wy 
th 


coincide except the i component: for w the vector us is taken 


as U. 


4? for w, the vector tis is taken as U; x: Thus the quantity 


A represents an increment of the cost function R when the con- 
trol u; is replaced by U,- 
THEOREM 6,4.. ‘het “the function =8(a) “be defined on “an open set 
containing ® and suppose there exists a point U; eU suche thaw 
thesfiunction B(a)i pis da iietents able at. B= a(u; io Then- 

ol.” Vit Us eW, and the set & is convex, then the conditions 
(4.3) and (4.4) are satisfied; 


e2. if B(a) is pseudoconvex at a with respect to & and ei- 


ther at the woint< a the, condition 64. 3).is satistied or at-the 


point Uy the condition '(4.4) is satisfied; ithen .a <Q , U5 ey 
O34 pit shes function. -BGa) sens Convex jon wee ben som samy; 
aeW,(a), u, «Wo(u,) we have the inequalities 


0<A (@) <<pj41(@), a—a, (4.6) 


A (a 
O<A (a) CH (x;, Uy, ty, Prar(@ (4)))— 
= aa (4.7) 
—H(x;, uj, tz, P41 (a (4;))). 
Assertion lis’ qd necessary condition for ‘a minimum of problem (471). 


Its proof is the same as that of Theorem 1.4.1. Assertion 2 yield- 


ing sufficient conditions follows from a similar assertion stated 
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in Theorem 1.4.3. °From the differentiability of BCa) at. a ‘we 


have 


B(a)—B (a) =<pj41 (a), a—a>+]a—ala(a, a—a), oun 


where lim (a,a-a)=0. i 
Cea 
If B(a) is convex, then by Theorem 1.2.5. for any ae we 


have 
dB (a) 
da 





< Pi+i(Q), a—a> < B(a)—B(a), Piss (2) = 
Taking for a the vectorba, =, we obtain that for any a <W,(a): 
BOs ee Bap) isp. (8) aay) = ADs 8), ae) 


Weshave thus arrived at (476). af the right Side of this formula 


LSRexpressedsinetermsvor thestune tion Hees thenswerobtainn (Ae 7 )ia/ @/ 


The necessary condition (4.3) can be called a discrete "state" 
minimum principle, and (4.4) a discrete "control" minimum principle. 
To obtain analogous assertions concerning the conditions (4.5), we 


define the function 


p (u;, Ht) = <Pjayia (uy), Ft tty 1) >= - 
=H (x;, uty ti, Pi+1(4;)). 


THEOREM 6.4.2. Let the composite function B(a(u, )) of u; be de- 
fined on an open set containing U and suppose there exists a point 
U <«U such that the function B(a(u,)) is differentiable at Mi 

Og, det u;€ Wy oad the set U is convex, then the condition 
(4.5) holds; 


e3., Le u,€ Wye, .therset U is convex, v(u, ,u,) is a pseudo- 


convex function of u; at the point U, with respect to U, then 
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av the point u, the condition (4.4) is satisfied and at the point 
a=a(u,) the condition (4.3) is satisfied; 

3, if ateche pornt u; either the condition (4.4) is satis- 
fied and U is convex or the condition (4.5) is satisfied, and in 
addition, at the point uu; the function B(x(u, )) is pseudoconvex 
in u, with respect to U, then u, ows. 

e4. if B(a(u, )) as} Eh (@oyelidene Tebhaweyealoye Gye il, Yer IW, — qulevenal 


ae 


for any u,< W3(u, ) we have the estimate 
O<A(G) Ay fa (%ir Min t:) Piss (@(u;)), 4; —4,>- C482) 


Let us prove assertion 1. From the differentiability of the 


composite function B(a(u; )) Lt hollows. ulat 


B(a(u,))—B (a(u;)) = : i 
= <H,(%;, Uy, t;, Pia. (a(4;))), het ae 
+ || u;—w; |B (a, t;—u,), 


where lim BCu; , u;-u;) = 0, 
Uo tle 
a. 2 
The condition u; eW, implies that B(a(u, )) attains its 


minimum in U at u;- Noting the convexity of U and using Theo- 


rem 1.4.1, we conclude that for any u;« U we have the inequality 


dB (a (u; = 
(Se |, —,) 20, (4.10) 


LoSo 5 wee, CoOuchiiovom (4,15) 2S Senralenraieel. 


Let us prove assertion 2. Using the differentiability of 


pCu,;,u;) in u, at u,; =u,;, we rewrite GEO) Sas So llowcs 


(2h wD ijp— Up) = 0, 
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Here vu, »u,) is differentiated with respect to the second argu- 
ment. But from the pseudoconvexity of vu, ,u,) it then follows 


that = = Seay = 
<F(Xi, Use bids Prar(A(4s))> =P (Uj, 4) <P (4;, 4). cote) 


Hance the vector u; Saas) arbitrary, we conclude that the condi- 
tions (4.3) and (4.4) hold. 

Let us show assertion 38. Let U, satisfy (4.4). Then for any 
Us U, (4.11) is satisfied. From the condition that U5 isa 


minimum of vCu, ,u,) with respect to u; on the convex set U, 


we obtain that for each u;« U we have the inequality 


dw) (u;, _ 
oe a 


= <H, (Xj, Uy, ti, Pi+1(a(4;))), u;—U,>, 

i.e., (4.5) is satisfied. From the pseudoconvexity of B(a(u;))¥ 
in u, we obtain that u,. «Wy. 

al ale 

The inequality (4.9) is obtained exactly as (4.6). /// 

Omitting conditions of continuity and differentiability, we 
can summarize the assertions of the last two theorems as follows: 

CG) Lf U, eW,, then for (4.4) to be satisfied at this point 


it suffices that either { be convex, or H(x, ,u; 


wc Bae nGe Fides) 


be pseudoconvex in U; Me wee point ua with respect to the 
convex set U; 

Orit at u,; «U the condition (4.4) holds, then for the 
condition U5 <eW, to be satisfied ipestiitices that. eather the 
function B(a) be pseudoconvex at Bs acu.) WEL Gla BESS wO) 9 
or the function B(a(u, )) be pseudoconvex in u; at u, with re- 


Spect tor the convex set )U. 
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In proving the necessary conditions for a minimum, the con- 
vexity of ® can be dispensed with if the variational inequalities 
are treated locally. For example, instead of (4.4) we require 


that the inequality s 
u - u 2 
(P54 4464u,)); f(x;,u,,t,) f(x;,u,,t,)) 0 


be Satisfied for all vectors u; (DS MO Tn Seal TAO) ames ih liene ly ei iig Oomea Ines) 
sufficiently small neighborhood of Uj The assertion that for 
the differentiable function B(a) the last inequality follows 
from the condition uj€ W, is called the local control minimum 
principle. We shall not formulate theorems on the local minimum 
principles, since they are similar to the statement of Theorem 


ee ie ie 


3. THE QUASIMINIMUM PRINCIPLE 


We shall compare the minimum principle formulated in Section 1.8 
with the results of the preceding section. The justification of 
the minimum principle for the system of differential equations 
(1.1) does not require the convexity of ®, whereas for discrete 
optimal control problems this condition is essential. It is not 
hard to give examples in which 2 is not convex and in the Space 
of controls and states the discrete minimum principles do not hold. 
Nevertheless, it is clear that taking sufficiently small integra- 
tion steps in the numerical schemes for integrating the system 
(Lit), it is poseable to bbtain arbitrarily close approximations 
of the solution of the initial system (1.1) under very general 
assumptions, properties of the solutions obtained differing only 


Slightly. This apparent contradiction is easily dispelled if the 
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discrete principles are given a different interpretation deline- 
ated by Gabasov and Kirillova in [1]. 

Let us show that approximate discrete minimum principles hold 
without the assumption of convexity, and the smaller the integra- 
tion step of the initial system (1.1) the closer the approximation. 
Here we are talking about the minimum principle in the space of 
controls and states. Since the linearized minimum principle for 
the system (1.1) holds only for convex UU. 

The sets W,(a), We (u,) introduced in the preceding sub- 


section can be represented in the equivalent form 


W,(a)={a€Q: <pj41 (a), a=min<p;41(@), a}, 


aeQ 
W,(uj)= {a EU? A(X, Uy ter Pr@)= | 
=minH (Xj ti ty Prss(a(u,)))}. , 


u,€U 
Instead of these sets we shall introduce sets obtainable by solv- 


ing similar minimization problems in which, however, only an ap- 


proximate minimum is sought, with an error not exceeding eh, : 


We (a) ={a EQ: <pjp4i(@), 2 <min<p;4,(@), a>+eh;} 


ae (4.12) 
WE (u;)={u,EU: H(x;, uf, try Proi(a(ud)) < 
: a a 4.13 
<minH (x), Hj, ty, Prsr (a (4) + eh}. nas 
u,eU 
Obviously, for any e>0O we have the inclusions 


W,(acW? (a), W,(u)cW§(u,). 


From (4493) 7it followstthat if uw, ee ie) then for any u; eU 


we have the inequality 


T(x, ie tr, Pi+ila (u;))) <A (x), Uy, tis Pisi(@ (u;))) +eh;. 
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The converse is also true: if this inequality holds for any 


u; € U,; 


the definition (4.12). 


then ape Wo(u,). An analogous property follows from 
e 
If, say, we rewrite the inequality in (4.13) in terms of the 
Sy scenm Cel.) ech sain (4.12) .6 C4.) oombecomes 


Rest Gaeta ty) aPenahee)> = 7 
<h,;min<f(%;, uz, ti), Pr+1(@(u;))>+eh; , 


u,éU 


whence, cancelling h;, we obtain that Wa, ) is the set of 
e-optimal points in the problem of the minimum of the scalar pro- 


duct (f(x,,u,,t,), Di 4q(aCu,))) be Ol 


Wa (u;)={u,EU: <f (x, uj, ti), Piss (a(u)))>< 
<mMiIn <F (xj, Wj, t;), Pear (a(u;))> +e}. 


u,eU 
THEOREM 6.4.3. Let the set Q=f(x,,U,t,)¢E" be bounded, let 
the function B(a) be defined on an open set containing the set 
and suppose there exists a point u; eW, such that B(a) is dif- 
ferentiable at a=a(u,). Then for any e¢>0O there exists h, >0 
such that for any integration step satisfying the condition 


O< h,< h, the following assertions hold: 


_ aeW; (a), — eh; <<Pj+1(a), a—a> WaEQ, (4.14) 
u,E Ws (u;), —& SCP 41 (4 (4;)), Fe ty, t;)— 
—f(%;, u;, #)> Vu, EU. (4.15) 


Proof. From the boundedness of the set 2 there exists a number d 


(the diameter of 2) such thaw tor any u,, U,< U the condition 
u ue u < 
If(x,, U;> t,) £(x;; Uj» t,)l <d 


is satisfied: 
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To prove the theorem we need to estimate the difference 
B(a) - B(a). The Newton-Leibniz and Lagrange formulas are not 
suitable for this purpose since they hold only, if % is convex. 
To see raid of *this conatuient lee use the formula (4.8) for differ- 
tiating B(a). From the condition ae f&% it follows that for any 
ae, B(a)e«B(a), which, noting (4.8), can be written 

<Pia1(@), a—a> S—la—alla(a, a—a)> 
>—la—al-|a{a, a—a)|. 

From the property of the limit of the function a(a,a-a) as 

a>a we obtain that for ratio £ there exists a 6>0O such that 


d 


|a(a,a-a)| <5 Fehon by tit 
Wa—al=A, f(x, uj, t)—F (xp, Un tO <6. 


This inequality holds for any u; eU if the integration step h,# 


is such that 
Oh, <1, = 0/d, (4.16) 


ide OMSIICa Ser, 

|a—al]- |a(a, a—a)|<eh;. 
When we conclude that for any he satistying (4.16) and @tor all 
ae we have the inequality 

— eh; <<p;41(4), a—a>, 
Puen. (4714) 1s savisiied av ae; passing to the control space 
we obtain (4.15). /// 


The requirement (4.14) can be called the discrete state prin- 


ciple for a quasiminimum and (4.15) the discrete control principle. 
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This Theorem makes clear that in solving many practical optim- 
al control problems, it is possible to use a sufficiently accurate 
discrete approximation of the initial system (1.1) and ignore com- 
pletely the discrete minimem principle, using, instead, the mini- 
mum principle for the system (1.1). Fedorenko [1], for example, 
notes that based on computational experience, ''the author has not 
been able to draw any practical recommendations that would follow 
from distinguishing between the discrete maximum principle and the 
maximum principle for differential equations and which would be 
efficient in computations." 

Necessary conditions for a minimum are very useful for solv- 
ing practical problems, since they "screen" the nonoptimal points. 
Most advantageous are the necessary conditions or sets of condi- 
tions which can discard a large number of nonoptimal points. In 
this regard, the theorems of the preceding subsection are more 
essential than Theorem 6.4.3. The practical value of Theorem 
6.4.3, which is, however, applicable to a wider class of problems, 
can even be enhanced when used together with the local discrete 
minimum principles mentioned in Subsection 6.4.2. For instance, 
the points u;«¢ U yielding the minimum of the function H with 


error eh, are likely LORDewop imal: 


4, SECOND-ORDER NECESSARY CONDITIONS 


In some problems, the condition 


WW 


H(x,,U,,t,,P,,,(a(u;))) H(x;,U,,t,.P,44(a(u,))) (4,517) 


holds for any u.¢U, u.¢W 


i i x: These situations are called singular 


regimes in optimal control theory. As in Section 24, sonencan) de= 
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rive necessary conditions for a minimum in this case as well. If 


B(a) is twice differentiable at a, then 


B(a)— B(a)= 


dB (a "1 —\7 dB (a: = 
a a a—a) ea (a—a)? ae (a—a)+ 


+]a—a|?B (a, a—a), 


(4.18) 








where 


lim (a, a-a) =O. 
ava 


We use the notation introduced in Section 6.1: 














—  db(xq(@)) 4B(a) Ob (xq) 
Pieil@) daa gh maak eres xg 

—  d%b(xq(a)) @?B (a) 2b (xz) 
Pee a ga ite tae ae aac 


The vectors Py and the matrices p,; are defined by the recur- 
rence relations of Section 6.1. The singularity condition (4.17) 


can be rewritten in the state space as follows: 





(28), o)= (88, 2) ven 


which together with (4.18) implies that if ae Ry. aoe! 1} aks) e@int= 


vex, then we have the inequality 
(a—a)" P;41(a)(a—a) 20 Wace. 


Just as in defining the linearized minimum principle, we take the 


function B(a(u,;)) of u, “and call the control singular if 


Hy (Xin His tis Prsr (4s))s iy aera (4.19) 
=s (Hy (Xi, Uy tir Divi (Ui), Ui> 


for any u;¢ Os u,¢ Wy. ae B(a(u, )) is a twice differentiable 
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function of u; at u, =U; and the condition (4.19) is satisfied, 


then for any u; e¢U we need to have the inequality 
—\7 aB (a(u Ts 
(u;—u,;)" Fete (uy, — —u;) 20, 


where the matrix of the second derivatives is computed by the for- 
mula (1.12). Similarly one can derive necessary conditions of 


higher orders. 


5. ACCOUNTING FOR INTEGRAL FUNCTIONALS 


All the above results easily extend to the case where the function 
R depends explicitly not only on the terminal state vector Xo 


but also on intermediate vectors Xj- Let 


R(x (w), @) =b(x,)+ Hise (Zz); (4.20) 


We extend the state space by introducing an additional state 


variable gos Let 


SAPO CZ), xt =0, 


x,=[t, x xo] € Ent, R(x, Ww) = x0t145 (x,). (4.21) 


To the extended state space there corresponds an extended im- 
a 
pulse vector ope 4 for which Pj is definable by the for- 


mulas (1.4) and (1.7) and the last coordinate is 


A dR (x, w) OR (x, w) 
pitt= eatin Piti= a (a2 


The derivative of the function R with respect to &5 is given by 





iz dR a 
a d dx; GRIZGNAS Ee (z\epe 
p=a= dR =| x (2j;)+ aye ae , (4.23) 
i 





n+l 
dx} 
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The derivative of the function R with respect to u; is given 


by 


dR _ 
Suge aeeiee Fala Pays 


and therefore coincides with (1.8). o 


We state the problem of minimizing R(x(w),w) with respect 
to the component u; of the vector w. - To this end,e we write Rk 


as a composite function of the extended vector 


R(x(w), w) = Bla) = B(a(u,)) ; (4.24) 
ROW UL Se es ka Demet e Cusseta d 
a iiaals 
S [F(z,), xi + C(z,)] . (4525) 


We assume that the functions defining the problem are differ# 
entaaple wee x.) Lnenstorreach vector u;< ime leet) mn @ltenc@ mec TNC! 


(4.22) define uniquely the sequence of extended impulses 


~ 


Bq? Pp 


gees Prat? This enables us to write 


Dy a1 = Pes (@) = Pyar (@(u,)) = [Pear (a (ey), 1]. 


Let us introduce some notation: 


W,—Arg min B(a(u,)), 


u,€U 


W,(a)=Argmin <p;41(a), a—a, 


GeEQ 


W(u;) = Arg min <Pi41(@(Ui))» & (uz) —4 (u;)>- 


ujé 


Here § = a(U) is the image of the set U. 
Further arguments and formulations of theorems repeat almost 
verbatim those given above. The condition that ther sSeu aun spe 


convex is replaced by the condition that % be convex. In parti- 
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cular, the necessary condition of a minimum takes the following 
form anethescontrol space) G44) saat u,=W, then for any u,« U 
we have 
<Piar(F (Xj, Wis t2)), “F Xi, Uy, t)—F(x;, 47, t)>+ 
+0 (xj, Uj, £,)—C(x;, uz, t;) 20. 
An analogous generalization can be given to the conditions 


Ges mande(42 5) 


6. ACCOUNTING FOR MIXED CONSTRAINTS 


We assume that the vector w is fixed as before, with the excep- 
ELON oO heats eee component uj; which should be such that the 
control u; belongs to. U, the conditions (2.3) hold, and the 
function R, (x(w) ,w) = by 2) takes on the smallest possible 
value. We also assume that the eet of solutions of this problem 
W, is nonempty. We denote by W, the complete control vector 

w for which one takes u,¢ W, as u;3 all the components of w 


es except for the a: compo- 


coincide with the components of w 
nent equal to u;° We now introduce the auxiliary function R 


analOcOusmctOnGhayonco) 2 


R(x, W) =p; (24)-+<u, g(x, w)> +<v, A(x, w)>. (4.26) 
al} ~ e € 
Here wu eB. wy Guay; V Eola Set 


C(z;) = <a, T(z) + <0) T* (2,5) 
b (2g) = wh, (24) +<ug, T(zq)>+<0,, I aS 


Then the function R(x,w) coincides with the function GZ OD 


enabling us to use the formulas and notation (4.21)-(4.25). 
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From minimization in the control space we proceed to minimi- 


zation in the state space. Thus the vector x has to be chosen 


i+1 
from the set % = a(U) such that R(x(w),w) , takes on the small- 
est value and the conditions (2.3) are satisfied in which only 

7 


these are essential: 


D1 (@, tpary bya) =" (242) =--- =T (2-1) =9, 
T's (z,)=90, PG; isa, Lees 
Le (2) oO erate yyeeo> I (eed) 
where Xi 49° Si4g7 00? Xo are composite functions of the vector 


x 


a ARs a4 Nevertheless, we keep the form of writing the auxiliary 
function (4.26), noting, however, nonessential constraints too. 
We shall use the general Theorem 1.7.6. By the formulas (4.22) - 


2 
(4.25) the condition (1.7.24) becomes 


AR (x (Wx), Wx) 
dxj44 


0<( ’ Xia (U;)—X p41 (ui) = 


=<Pi+1(Qe), @—a,.> Wu; EU. 
Or, passing to the control space, we obtain 


Os <p al (%, ure ty); Pixies, t,)—F (x, uj, t;)>+ 
+ <i, Wis 8 tad (Xie b> (4.27) 


+.<0;, T?(x;, uz, t))—T*), uj, t,)> Wa, EU. 
Using the linearized principle of the minimum, we have 


KF (%;, Uf, t:) Pisa (F (x, U7, EN) ATE (X, Ut, Fe) ar 
+12(x;, uj, t;)0;, uj—up> 20 Vu,EeU. 
Analogously, one can extend the remaining assertion of Theo- 
rems 6.4.1 and 6.4.2 to the case in question. 


Theorem 1.7.6 can be reformulated as follows: 
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THEOREM 6.4.4. Let Q be a convex set whose interior is nonempty. 
The minimum of the composite function Dyke with respect to 


U; e€U, taking into account the constraints (Ao 3)) SS Bi Geannecl eye 
the point ul = Wy; the functions defining the problem are contin- 
uously differentiable with respect to the components of the vector 


of a 


xX. Then there exist u fe Hs,Vv. 20, not all equal. to zero, 


ee adi 
Suche thateron any u,¢ U the inequality (4.27) holds and the com- 


plementarity condition 


(Ayala ary)0 (428) 
is satisfied. 

Noting the feasibility of the vector u; and the condition 
(4.28), we write the inequality (4.27) in the form 

Dea P pagel is 82 al); Prue hye 
ee as La M(x;, u;, t;)>+<v,, 1? (%5 U5, t)>. 

If the set U is open and the functions determining the pro- 
blem are differentiable in U; > then by Theorem 1.3.1, from 
@Gl2Z Mista tollows) that 

FalXp Wis t) Pras (FX, uj, t:))+ 
ite Urs Oi; Te ix. e, £0, = 0, 


If the problem is one of convex programming, then these con- 
ditions are simultaneously sufficient for the minimum also. /// 
7. SOME GENERALIZATIONS 


Throughout this subsection, all components of the vector w are 
fixed except for Uj> which essentially simplifies our discussion. 


All results are easily extendable to the case where the complete 
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vector w changes. We make the notation as close as possible to 
to that introduced in Subsection 6.4.1. The feasibility condition 
OLeche COntrodsvecLory. Ww = [Uy sUgs+--s U4] eS G ccm eL Ug % Wh 

BYEGle cp mUOReCaAch Cettar w there corresponds a complete 

, 

state vector a= [X4sXoree+rX yl. We denote this dependence by 
a = a(w) and assume that 2 is the union of all possible complete 
state vectors corresponding to all possible feasible controls. 
The function b to be minimized depends explicitly on Fa and 
implicitly on the other components of the vector ae Gs shience we 
can use the earlier representation ee) = B(a), however, with a 
different meaning. If the vector function f is differentiable 


in Xj andthe eiune ta Ons) sans then by the formulas of Sec- 


aie 
tion 6.1, the function Bla) 4s differentiable in a and we have 
2 


B(a)—B(a) ees a—a)+ |a—ala(a, a—a)= 





= 3 <r @, 1% Ha—a]a@, a— 2). 


set & 


x are me BCs) lt fs Conve mand aoe then 2t-as 
ae 


necessary that for all ae 


(gee) 


= 
neat a—a } and) 


Analogously one can extend the results derived above to the com- 
plete vectors w and a. 

It is not hard to show that the results of this SeCrlonmake 
also extendable to the Runge-Kutta intesrationi of the system (1.1). 


In this case the formulas are more cumbersome and hence we omit them. 
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5. NUMERICAL METHODS BASED ON DISCRETE MINIMUM PRINCIPLES 


Discrete minimum principles make it possible to construct several 
numerical methods for findigg the points of the set W, defined 
by (4.1). This is an auxiliary problem of implementing the meth- 
ods described in Section 6.3 for solving discrete optimal control 
problems. 

We use the notation introduced in Subsection 6.4.1. The pro- 
blem of minimizing Dae with respect to u; €U can be replaced 
DD Vamu i antmeon seeking the fixed points of the multivalued mappings 
W,(a), Wo(u, ), W3(u, ), i.e., the points satisfying respectively 
condition (4.3), (4.4), (4.5). It is not hard to formulate re- 
quirements in terms of the initial problem that are sufficient to 
invoke Kakutani's theorem, providing sufficient conditions for the 
existence of fixed points (see Appendisg Db) a0. Hon Simplicity we 
will consider the point-set mapping W,(a). 

LEMMA 6.5.1. Let the function B(a) be continuously differentiable 
and convex on an open set containing the convex compact set Q. 
Then the multivalued mapping W,(a) has a fixed point a, at which 
(4.3) holds and the set of preimages of a, ilies in Wy. 

Indeed, for each: aeQ the set W,(a) is nonempty, convex, 
compact and contained in 2, hence by Kakutani's theorem Wy has a 
fixed point. 

The vector Pi4y is the gradient of the function B with re- 
spect to Xigq? hence the minimization (used in defining the set 
W,(a)) of the scalar product (peg 7ieiga) with respect to x. 


alectsade 
for the fixed vector Peay is, in essence, a minimization of the 
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linearized function B. This is reminiscent of the conditional 
gradient method described in Section 5.4. 


We shall assume that B is defferentiable in Xaay and the 


* 
discrete minimum principle ‘is satisfied for all u, «Wy. Ties 


eee iteration for minimizing B with “respect to us; let the 


k : k k 
vector uj and the corresponding vectors Pisa? *441 be known. 
We shall describe several computational procedures for the condi- 


tional gradient method. 


We determine the auxiliary vector xia7 by 


Xi41€ Arg MM <Prais Xiai (ae) 


ernie 


ite. k k EA wel i 
Sn Sy 44 Ske ny) = Pea By (Ste (keener 


We assume that 2 is convex and closed. 


; F aor Ween ke 
We find the minimum of B(x; 44) on the segment joining Xe 44 
and Xi44: 
t,€ Arg min B (x*, 1+ td,). (Dee 
0<t<1 


Next we determine the new vector 


i k 
XEt1 = Xin Tadh (5.3) 
: E : k+1 
We find the corresponding gradient - Pay? etc. The sequence 


? : k 
(BOx4 44) decreases monotonically. Indeed, if 6(%544)< QO then 


we have 


B (Xia + td,) = B (x? 1) +16 (Fs) +0 (t?). 
Hence at least for sufficiently small 1>0 


B (x, 1) > B (xf. + tdy) > B (xt). 


(420) 6, NUMERICAL METHODS FOR OPTIMAL CONTROL PROBLEMS 


k A : 
TS MC and B(x; 44) 1S pseudoconvex, then for all 
k . . A m4 
Xia4y eM). B(x# 4) <B(X, 4): Thus vies manamunyote Baron as eat 


; k 
tained at Xi41 


and the minimization process stops at this point. 
On these lines, one needs to solve the problem (5.1) of mini- 
mizing a linear function on the set 2 and then to determine the 


control u; ensuring transition of the system (1.2) from x. to 


le 
ee Tiesstructure. or thes sete may be very complex, which 
makes the solution of (5.1) difficult. It is Simpler to pass from 
the minimization in the state space to minimization in the control 


Space. Instead of (5.1) we seek 


n, 6 Ara imin'< pt... %..\52 Ate min A AOU te ey 
ne Se Pisy te neo (x; t t Piss) (5.4) 


Next we set eat F(x, ,u,;,t,). BY D a2) anid Conia went ad 


cae the Ty, and the control ee Both versions of) the meth— 


od generate the same sequence ee (under the condition of 
unique solvability of the minimization problems); but numerically, 
to implement the version in the control SDacewds, as a aculiemmed si 


er. Conversely, the state space is more convenient for proofs. 


A third version of the method can be devised , using a linear- 


ized discrete minimum principle. We define the control ee by 
ls 5 1 tfOH (on urd phen) (5.3) 
u,e€ Arg mint ——2 te ee -— ul . 
rE api Ou ’ Uu; ut), 
; k ¥ k 
eae (Pata +t — a7), (5:6) 
up** = ui +t, (u;—ub). (5.7) 


Obviously, in this case the values B also decrease monotonically 


On each iteration. 
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Using the results of Section 5.4, it.is easy to formulate and 
prove the convergence of all these schemes. Hence we limit our- 
selves to a convergence theorem for the first version. 

THEOREM 6.5.1. Let B(a) ‘be continuously differentiable and 
pseudoconvex (with respect to &) on ar open set containing the 
convex, compact set @ and let the gradient of B(a) satisfy a 
Lipschitz condition on ©. Then the set of limit points of the 
at 


sequence {x¥ } is nonempty and consists of points x 


a+ alg pal 
which (4.3) holds, the corresponding controls being such that 
u, «© Wy. 

If we simplify the conditional gradient method by dropping 
the line search and, in addition, always assuming that t= 1, we 


obtain the following methods: 


k4+1 k 

Kear © Wy Oya) > (5-8) 
kK+1 k 

k+1 k 

u; € Wa (uj) ; CoO) 


These modifications do not ensure that the function to be mini- 
mized decrease monotonically. Rather, one seeks the fixed points 
of the point-set mappings Wi> Woe and We respectively, using 
the simple iteration method. For such problems this method is 
frequently divergent. 

If the discrete minimum principle holds for any i, then 
these methods can be used, changing the control simultaneously for 


all ief[i:q-1]. Obviously, all the results extend to this case. 


Such schemes of minimizing R with respect to w make it possi- 
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ble to "decompose" the initial problem into many minimization pro- 
blems with respect to vectors U; of small dimension and, next, 


choose the steps T These schemes are alluring because the com- 


k° 
putations are simple; BOwoVerS if the initial control is known 
only approximately, they converge slowly--in fact, the methods 
(5.8)-(5.10) frequently diverge. Hence, first the function R is 
minimized with respect to w by some unconstrained minimization 
method, and only then one switches to these procedures. 

Applied to the system (1.1), the process (5.9) coincides 
with the method suggested by Krylov and Chernoous'ko [1]. To elim- 


inate the frequently observed divergence, Chernoous'ko and Banichuk 


[1] suggest one should determine u; IDO (CB) yey! Ty and 
bs from (5.6) and (5.7). The convergence of this procedure, if 


the u; obtained from (5.4) are different from those obtained ac-— 
cording to the rule (5.5), needs special justification. 

We note that the "decomposition" of the problem of minimizing 
the function R with respect to w can be solved without the 
discrete minimum principle. Indeed, we can perform component-by- 
component minimization of pe solving sequentially the pro- 
blems of minimizing Wee with respect to uj SOD GA Sh  etae 
q-1. We apply, for example, the gradient-projection method in the 


State space and control space, thus obtaining the following numer- 


ical schemes: 
R+1 


Xi 41 = MQ (Xfa1— TDi 1 Cpa 
eo k R 
ue = ty [wi Sta (X;, uF, ti, Pitt (a (wf)))). 
Here Ty Cx) denotes the projection of x on the set X. Proot 


of convergence and properties of the method follow from the re- 


sults of Section 5.5. 
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6. SOME GENERALIZATIONS 


The foregoing approach can be applied to more general problems 
than described. Let us delineate basic possible generalizations, 
invoking, for the sake of simplicity, discrete approximations of 


the system (1.1) using the Euler scheme. 


1, OPTIMIZATION OF CONTROL PARAMETERS 


The statement of this problem for systems described by ordinary 
differential equations with control parameters is given in Section 


1.8. We use discrete approximation of the system (1.8.10): 


Kiar =X; Af (%;, By, bjs S) =F (%;, , f,, §). (Gat) 
The function 
g=i . 
R(x(w),W,E) = b(wy,u,,ty.e) + ey h, B(x, ,U,,t,,€) 


is the control performance criterion. One must choose the complete 
control vector w and the control parameter vector &€ so that the 
function R has the smallest possible value. Assuming that all 

the functions determining the problem are €&-differentiable and 
using the arguments of Section 6.1, we obtain the following formu- 


la for computing the derivatives of R with respect to €&: 


-1 


eos (2 >> fH (2;, Pitt os (6.2) 


t=1 


2)= [475 ur, tN 


where 


A (2; Pit. §)=h;B (z;, 6) --xF (2, a) Pi+D> 
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A necessary condition of a minimum is given by 
q—1 

be (Zq, E+ 2 fe (z;, Pit. 6) =0 ’ 
era 


which can also be obtained directly from (1.8.11). 

The formula (6.2) ee it possible to calculate the gradient 
Ole he Wich cespech maton co mOnGEapp ly In LUM TMemapproachmused 
earlier in determining the vector w. Let us combine the control 
vector and the control parameter vector, setting w= iiwWisS ieee lem 
we minimize R with respect to the extended vector wW. Therefore, 
there is no fundamental distinction between the optimization with 
respect to the complete control vector and that of the control pa- 
rameter vector. At the same time, the "incommensurate scaling" of 
w and € sometimes aggravates the numerical implementation of 
the process. These vectors differ in their meaning and the extent 
of heir eriéet on) R.. Hence it is essential to make o special 
scaling of the vector €&. In many cases it is also useful to de- 
compose the process of minimizing R with respect to w and ae 
First R is minimized with respect — and next with respect to 
the complete control vector w. 

The control parameter vector can be introduced into the sys- 
tem artificially. For instance, when the interval of motion T 
is large, or high accuracy for numerical integration of the system 
Gee) eas required, there is a need for a small integration step, 
which leads to a high dimension of the vector w. In such cases 


the control is sought in the "feedback" (or synthesis) form: 


ucts Va Cet) u(t) = Yo (s(t) 2, ts. 
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The functions 4 and Yo depend on t as well as on the control 
parameter vector. Substituting these formulas into the right side 
of the system (6.1), we reduce the initial problem to an optimi- 
zation problem with respect to the control parameter vector. In 
passing to discrete approximation, this enables us to lower the 


dimension of the auxiliary problems of unconstrained minimization. 


2, A PROBLEM WITH INCOMPLETE INITIAL CONDITIONS 


For the system (1.2), some coordinates of the initial vector Ky 
need not be specified. To define (1.2) more precisely, we intro- 
duce an additional fictitious step x, = hogs using the formula 


(1.4), we obtain 


Once the gradient is found, we can carry out the computations as 
usual, changing the missing components of the vector xy and im- 
proving the complete control vector w. 

This formula can be used to solve boundary value problems by 
Newton's method. Indeed, let the complete control vector w_ be 
fixedens Lach oOtsthes vectors X4 and a consists of two vectors: 
x, Sax Xl xie= (ix. ,cc. doe ewhere eC %,,k..<«EO and 

al je “l ) 1 Cle q ? i? q 2 The q d 
the vectors x4 and x_ are given but Ky and xy are unknown. 


Given an arbitrary vector Xy we integrate (1.1) by some scheme 


and determine the corresponding vector XQ Thus, we have the func- 


“a 


tional dependence xt (X,), where » iS a mapping from EY in 
Eu The problem consists in finding a vector x# such that 
— x* i) _ 
o(x,) = He 
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Newton's method yields the following computation scheme: 


—k+1 -k -1 -k = 
= & = 7 Goo 

xy xy A, (o(x,) xq) ( ) 
Here A. is the vxv matrix of the first derivative of the vector 
LUNeETON 6 (Xs) whose ee row consists of the components of the 
vector 

ge dep! (x1*) | 
dx, dio Gn he eam 


One can compute the coefficients of Ay» uSing the results obtained 
in Section 6.1. To do this, we take the AeD components of a 
as= Re Then, by virtue of (1.5)54 the impulse vector Pe will have 
all components equal to zero except the os component equal to 
one. We determine the vector Py. from the recurrence relation 
(1.7). We combine its first v components corresponding to xy 


and write Py< EY, From the results of Section 6.1 it follows 


that ; 
o db 9 (x4) 
Le et ee 
dx, 
Afttervthie computations: for je 1)2,.0.,04e) wes compute 211) the 


elements of Als To this end, the recurrence relation (1.7) has 
to be calculated v times, upon which we can make one step of 
Newton's method (6.3). In the numerical implementation of this 
process, certain complications associated with divergence may de- 
velop. This occurs if the initial approximation for xy is known 


to) be badey Hence, at Hirst) ay esradient scheme as used; only after 


that is Newton's method used. 
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3. A VARIABLE TIME PROBLEM 


For the system (1.2) it is required to find the complete control 
vector w and the interval of motion Lote | SOuthau asi unectiony 7h 
of the form (1.3) at the Final time T have the smallest possible 


x 
value. 


We set and fix the sequence of integration steps h,,ho, Vy 
Bele The interval of motion is changed with the aid of the scale 
LCtOn : saJlst Aacuin Section 48. by introducing an addi taonea | 
State variable we convert the system into an autonomous one and 


substitute the independent variable t=t&. The variable t varies 
q-1 

in the interval [0,T)], where To = ) h;; the variable t varies 
i=1 


in the interval [0,T], where T= To. A discrete approximation 


of the system (1.8.18) by the Euler scheme yields , 


X41 =X; + Af (xX;, Uy, x7*7), (6.4) 
Rie xp ol bis eo tO; 


The function (1.3) to be minimized is given by 


q-1 
R=b(Xqy Ug XE™*)AE DAB xp, Uy x7). (6.5) 


For the system (6.4) in standard form (6.1) one seeks the complete 
control vector w and the control parameter € such that. the func— 
tion R assumes the smallest possible value. Applying the formu- 
la (6.2) to the system (6.4), we obtain 


Giect : 
Fe De Me LB eis es HEA (is es PO) Berd PTE 3 By 
f=) ; 


Pi=Disar t Sh; [B,. (Xj, Uj, Mee ita (kis Uy xr?) Pretl, 
pitt = pHi t+ Sh; (By (xj, Wes OP) (Mp Mi 27"), Psd] 
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The conditions at the right endpoint are: 


Ob (X4, tg, ty) Ob (Xg, Ug, fy) 


i n+. 
Pq OX, eats Aly 


implying 


6 
=f 
Ob (Xq, Ug, : 


tq) 
Pe h, [B; ee, us, bea a 
St 
+<fi (Xs us, co), Pel lx<i<gq—l. 
The formula (6.6) is simplified if the initial system (1.1) is 


autonomous and the function B does not depend explicitly on t: 


q-1 
ea De 1B Ut aE Os tas Prvsd] + 
+ Tbe, (areca tay: 
The connection of this result with (1.8.20) is obvious. These for- 
mulas make it possible to construct gradient methods for the mini- 
mization with respect to the parameter €&, finding thereby the 


optimal size of the interval of motion. 


4, A MINIMAL TIME PROBLEM 


It is required to find the complete control vector w and the 
corresponding solution of the system (1.2) so that the conditions 
(2.3) be satisfied and the interval of motion be smallest. 
We ase the representation \(6.4)ebut, im contrast to, (6.5); 
Maal 


the function to be minimized Ry = =F - Following (2.5), we have 


the Lagrangian 


R=xt+4<u, g(x, w)>+<v, A(x, w)d. 


ihe coudiuirons (6.2) tom this probiltem 2s 


q-1 

dR 

a ae hi (<P (x, &;, t)), ppa.> + pri). 
= 
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The conditions at the right endpoint are: 
Deer Aah Ug) Ug ete (Age tig) On fa Sed 


For an autonomous system, in particular, we obtain 


i 

: bs 

zo Ai LL +<f (Xj Yj), Pra]. 
hea) 


5. ACCOUNTING FOR NONDIFFERENTIABILITY OF COST FUNCTIONALS 


Among practical problems, it is often possible that the cost func- 


tionals are not differentiable, for example: 


T 
R= max o(x(t), u(t, Ri=Slo(x(), 4@, Alde. 
SITS 0 


Pim Sle ROethostact elhat, dem aS andi tt erentiablentunctaongon eins 

arguments, the functionals Ry and Ro are differentiable only 
2 

direc tional Lye peor Ri we introduce an additional control para- 


meter € with respect to which we make minimization, and also a 


new constraint, setting 


Fe fer Cree) metre 6 me eee 
For Ro we introduce a new control u(t) and two additional con- 


straints, setting 


Ct dt ad OG Cig UCTS tou O6s6C tH), gBGE 55 3b ns 


rg 
bo 
ll 
oa 


The minimization is with respect to “uCt)= and u(t), In 
both cases, the problems are reduced to standard form. The non- 
differentiability of the functionals is removed and all the methods 
can be used without any additional modifications. One can treat 
the simplest minimax problems (see, e.g., Davydov [1]) in a simi- 


lar way. 
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It makes sense to use this transformation if all the func-— 
tions determining the problem, except R, are differentiable with 
respect to the components of the vectors x and u. Otherwise, 
there is no need to get rid %f the nondifferentiable cost func- 
tions, and the computations need to follow some less efficient 


technique avoiding differentiation. 


6. ACCOUNTING FOR BOX CONSTRAINTS 
Many optimal control problems involve constraints of the form 
ae See Se Dies CSE 1) 


If solution methods are used which are based on auxiliary proce- 
dures of unconstrained minimization, it is not advisable to treat 
the constraints (6.7) as general-type constraints (2.3). The for- 
mer are Simpler to account for in solving unconstrained minimiza-— 
tion problems. Indeed, one can show that most of the methods giv- 
cneAneseculoneG. oeremaAinvettectiive lt anstead lotthe auxiliary un- 
constrained minimization of the function R with KESVEC hi sLOM Ws 
one solves the problem of minimizing R with KESPCGt COs Ww MOl 
the set (6.7). Hence the library of unconstrained minimization 
problems has to contain two kinds of programs: one that accounts 


for constraints of the form (6.7) and one that does TO tee 


7, ACCOUNTING FOR CONTROL "CONTINUITY" 


In some optimal control problems, an additional constraint is im- 


posed on the control variation rate: 


du(t ) 
dt 


Sua Cer Be (6.8) 
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Tire wt Soe elausl Te 


i 441 2re neighboring points in the t-integration 


grid, then in the discrete version this constraint becomes 


= < - = < eas 
ch; ole u(t; ) ch; 


These two constraints can be eoneidered as constraints of the type 
r? <0. In using the formulas of Section 6.1 for computing the 
derivatives, one needs to take into account the additional sum- 
mands, since the constraints on the anes integration step depend 
on both uj and Us44° We note that the constraints (6.8) can be 


introduced artificially for regularization. 


8. PROBLEMS WITH DISCONTINUOUS RIGHT SIDES 


If in the system (1,1) the vector function f(x,u,t) is diftferen— 
tiable in x and u_ everywhere at a finite number of points , 
t,t, where there is a discontinuity of the first kind, then the 

sizes of steps h; need to be such that all the points {tt are 
nodes of the principal grid in t. The formulas given in Section 
6.1 do not change. Similarly, if at some points ts of the prin- 
cipal grid in t the state trajectory has a preset discontinuity 
not depending on x, u, then the formulas of Section 6.1 for com- 


puting the derivatives do not change either, lt thesdiscontanul ty, 


depends on x, u, .i.ée., 


Cine) o> |. Cc tp Lee bs 
where 


ee) = cles Gite x(t;_) = icine Cues 
* t>t ,+0 tt +0 


then the impulses need to be recalculated at these points. In de- 


termining Pj-41? instead of Pp; we need to take the vector 
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Vx (ts i, u(t; ))p; and in the formula for computing the deriva- 
tive dR/du, add the summand Wy Catal), u(t; ))P;- Li eSHno t 
hard to account for discontinuities of more complex form (see, 


e.g., Velichenko [2] and Chentsov [1]). 


9, ACCOUNTING FOR DELAY 


Let the control process be described by the relations 


ey 2)= (21, 8 ree uss) 2; =[*;, U;, t|, 
eee b(2_) + 3 hyB G), 


where s is a positive integer and the xy s?*9 grt ky are 


given and fixed; the complete control vector 
w= [U,_.> Un_gr tte Up» Uys see, U ] 
is sought. Following the arguments of Section 6.1, we obtain that 


ON ee ee 


dR == ; 
P= ia Bes (z;) DisrtFy (Z:+5) Pi+i+s9(i+1+5, 9), 


where C(a,c)="1 if ace and 6(a,c)=0 otherwise, Moreover, 


Setting Pg OR ex, ; we determine Pg-1? Pg-a7e++> Py. Next we 
compute the required derivatives of R with LESpeCGunco u;: 


Fe Set Pa &) Pi 418(1, + 


+F, (ae Proves 9 (i t- 1 +8) 9). 


Here 4 <€ |l-s:q-=1)5 with 
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me 0 
Fy (21) = Gay Ps Uu;, ts, Xi-s) Lies) 


es 0 

Py (Zt+s) = Gy F iss Wis ss teh Xj u;). , 
Likewise, one can take into account more complex forms of delay. 
In all these cases we can compute the ‘derivatives of the function 


R, reducing thus the problem to a form amenable to nonlinear pro- 


gramming methods. 


10. A SPECIAL TECHNIQUE 
Some problems involve contraints 
O(eCt), tt) S ult) Ss oCx(t) tc, (6.9) 


where @$¢ and wW are r-dimensional vector functions. Such con- 


straints) are a special case of (2.3)sand may be accounted for by 


the methods described earlier. However, it is simpler to ee eae 
duce a new control v ain ae setting 
WCE CENT CE) t) CL vcr) 0 x(t), 0), (6.10) 
where ie¢[i:r]. We impose the constraints 
Op Sacre (enth) 


We insert (6.10) for u(t) in the right sides of (1.1) and re- 
place the initial problem by an optimization problem with respect 
to v(t) under the simple-constraints (6.11). The conditions (6.9) 
are thereby guaranteed. This simple technique makes it possible 

to account for (6.9) without introducing additional state con- 


Straits. 


a 
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7. EXAMPLES OF NUMERICAL COMPUTATIONS 


Many of the methods described above were tried out for solving a 
Variety Of tesu and practical problems. We shall list some: re— 
sults obtained on a BESM-6° computer. In order to follow the con- 
vergence of the process closely, the same conjugate-gradient meth- 


od was used to solve auxiliary unconstrained minimization problems. 


1, THE SIMPLEST MINIMAL TIME PROBLEM (ZERMELO'S PROBLEM) 


The statement of this problem:is given, for example, in the article 
of Powers and Shich [1]. The control process is described by the 


following system of differential equations: 


dx} 


age ed ce 
age =a == OMI, =U, 


d ; 
dt a 
A (O) =x? (0) = x*(0) = 0), 


= COS) Ae. 


The constraint | uct)| < 0.5 and two terminal constraints 
x CT) = 4, CT) = 3 are imposed. The functional is the time of 
motion, i.e., R= T. For definiteness, time is measured in seconds. 
The problem has an obvious physical inter- 
prevation. Let xt, x be the Cartesian 
coordinates of a material point moving in 
the plane. Its velocity vector is equal 


to one in modulus and forms an angle ae 





with the positive direction of the axis 


Figure 3 
at 2 
OX” (Figure 3). At the initial moment 
t = 0 the velocity vector is directed along the axis oxt, WAC es 
3 i : 
x (0) = 0. It is required to move the point from the origin to the 


point [4,3] in the Shortest time. TI by a material point we mean, 
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say, a car, then we can regard the turn angle of the drive wheels 
as a control, and the constraint bu(t)\i<: 0.5 has thusi a simple 
physical meaning. 
Proceeding to numericat solution of the problem, we make a 
y 


change of independent variable, setting t = 1T. Then the system 


takes the form 


Eitan sds : ay ee 
ap =P cos(**), Go=T sin(**), Ge=Tu. 


It is integrated for 0 <t<1 by the Euler scheme with constant 
step h = 0.02, which corresponds to the number of discretization 


points q = 51. The total time of motion T becomes a control pa- 


rameter. 
The initial approximation is Ug (Tt) = O¢ To = Ssec. lhe econ 
, 
straint |u(t)| < 0.5 is treated as a "box constraint": 


-0.5 < u(t) < 0.5 (see Section 6.6) and is accounted for in the 
conjugate-gradient method. All computations are carried out in 
the dialogue (interactive) mode with the aid of the DISO system. 
Solution of the problem starts with the first version Ofte Gite 
cost-function parametrization method OPT45. The lower estimate of 
the optimal value of the functional in this problem is easily ob- 
tainable. The shortest distance between the points LO, 0) and 
[4,3] Sie +o, Mand tis covered” in ~s"'sec. at unit velocity. Since 
the initial direction of the velocity vector coincides with the 
axis oxt, the trajectory and motion time will be greater than 
these values. The lower estimate of the optimal value of the func- 
tional T> = 3 sec. satisfies the conditions for OPT45. After the 


first two iterations, the computation continues by OPT53 (the mod- 


ified Lagrangian method). 
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In Table 1 the values: of the functional are shown in the re- 
spective iterations of each method. When the computation ends, 
BHewrosult cma ke Re = Dis 26122770 Sec. x (T,) = 3.99997, 
x7(T,) = 2.99999, which one may regard optimal with sufficiently 
high degree of accuracy. The corresponding control and trajectory 


are shown in Figures 4 and 5. 


Table 1 





Method 


Number of 2 
Iterations 


1 
Value of the 
Functional 3.00000 3.05383 4.51062 
















Number of 
Iterations 






ee ie iva a 
5.12532 (5.12416 | 5.12297 | 5.12284 | 5.12277 


The problem was also solved by another 







Value of the 
Functional 





combination of algorithms: first by OPT41 
‘(2 iterations) and next by OPT53 (5 itera= 


tions). The results are given in Table 2. 





Figure 4 Figure 5 


(437) 6.7, EXAMPLES OF NUMERICAL COMPUTATIONS 


Table 2 
Method OPT41 
Number of 
Iterations |} : 1 A 
| Vanier @ teat em z 
Functional 3.00000 4.61421 4.78960 | 


Method 


Number of 
Iterations 


Value of the i 
Functional Heel SOM ORE ao o) Sens By, WAGE |) Bo ULASS 





The final values of the functional and the state coordinates 
were close to those given above: R = T, = 5.12265 sec., , 
xi (T,) = 3.99998, x*(T,) = 2.99999. The corresponding control is 
shown in Figure 6; the explicit "switching" form should be noted: 
at first, an abrupt turn in the direction of the terminal point 
u(t) [4,3] with the greatest. possible angu- 
0.5 lar velocity and next, motion on the 
line. The switching occurs at the mo- 
ment when the velocity vector is first 
directed toward the required point 
[4,3]. Although the end results in 
Det both versions of the problem are very 
Figure 6 close, the graphs of the controls dif- 
fer after the switching. The method OPT45 introduced these "de- 
fects" in the graph in Figure 4. Subsequent computations by OPT53 


bridged the gap between the "peaks" only slichtly but did not 
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eliminate it. The control shown in Figure 6 seems to be more 
physically correct than others. One may draw two conclusions: 
first, the well-known fact that convergence of the functional does 
not imply at all convergence of controls; second, differing algo- 
rithms are useful since it then becomes possible to analyze con- 


structively and choose the easiest implementable controls. 


2, A SIMPLE PROBLEM WITH STATE CONSTRAINTS 


The statement of the problem has been taken from the aba Cem Om: 
Mehra and Davis [1]. The numerical results are given therein. The 


control process is described by the equations 


in ea drat mee 

> > 
dt dt Ta 
LOOMED, * xe" COM = aatedienionaea cary 


The problem consists in minimizing 


1 
Pies iG oe ix)" 4 0,008" Jat 
0 


under the constraints along the trajectory 


D(x,t) = x°(t) - 8(1-0.5)" + OB <0. 


The system (7.1) was integrated by the Euler scheme with re- 
calculation with constant step h = 0.02 (q = 51). The control was 
assumed constant within the integration interval. The initial ap- 
proximation was Ug (t) = 0. The approximation problem of nonlinear 
programming consists in minimizing the function of 50 variables 
under 50 inequality constraints: Tips, = MCx,,t,) <0. 

The strategy of seeking the solution is’ Similar to the one 


used in the previous problem, i.e., after the first three itera- 
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tions by the penalty-function method OPT413, the computation con- 
tinues by the first version of the simple iteration method OPT533. 
In Table 3 one can see how the functional R .changes depending on 
the number of the eer onkd The optimal trajectory x? (x) and 
the corresponding control law U(t) are represented in Figures 7 
and 8. In Figure 7 the lower curve is the trajectory BA( 6) under 


the initial control Ug(t). 
Table 3 


Method OPT413 


Number of 
ho eee ee, 
Value of the | 


Functional OVALGW 267) 















0.600368 





0.163748 0.166120 












Method OPT533 
Number of 
ee er ene emer aon 
Value of the 
Functional 0.169152} 0.169468! 0.169487] 0.169480 | 0.169480 












Figure 7 Figure 8 
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The "jump-on" time (ty = ORiar Ge = -0.179077) and "jump- 
off" time (ty = Oe ace Ge) = -0.179997) with respect to the state 
constraint yield oscillations (break points) on the graph of the 
optimal control law u(t). €iIn this solution of the problem, some 
constraints are not satisfied, the maximum violation being 
max T, = 3+107+, 

: This problem was solved using also the programs OPT41, OPT53 
and OPTSin which the system (7.1). was integrated by the Euler 
scheme. ln, the first series ot computations, the first three ite- 
rations were made by OPT41 and next eight iterations by OPT53, ob-— 


taining: R, = 0.181378 and max T 5 = o-10n. In the second series 


* 
iy 
of computations, one iteration first by OPT41 yielded approximate 
values of the dual variables, obtaining: R, = 0.187064, 
max qr. = 3-10. upon which the program OPT8 was used via 


i 
Newton's method and in eight iterations the results showed high 


12 


accuracy of the constraints: max De = Ate LOe os Ry = 0.181378. 
i 


The controls differ from those in Figure’6 by/a quantity of 


order 10> 


and it is impossible to observe this difference graph- 
ically. That is why we itemize in Table 4 the values of optimal 
convrol and, of the constraints which have been computed by the for- 
mulea.C38.7), Note. that.in ‘this problem we again encounter the sit- 
uation mentioned in Section 6.3 concerning the program OPT8, when 
the function does not depend on the controls explicitly.+ Using 

the program OPT8 with the number of integration steps q = 70, we 
obtain Re =,0.17790..4, Thusithe value of R, approaches the values 


upon integration of the system (7.1) by the Euler scheme with re- 


calculation. The most exact solution can be found by Newton's 


ROMA 








» 300398, 5 +01 


. 756003 


107° 
» 299726, 9 +00 


- 434525, 4 +00 


- 008026, 9 +00 
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Table 4 





. 400000 
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. 488000 
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. 424800 
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method, but we have not yet obtained such results. What we have 
obtained so far attests, however, to the potential superiority of 
this method. This problem might be the touchstone of the effi- 
ciency of the numerical methods, and was used as such by Mehra and 


Davis [1] and by Fuji, Fujimoto, and Ono [1]. 


3. THE PROBLEM OF VERTICAL ASCENT OF A ROCKET 


Without exaggeration, one may call this problem the classical test 
problem of optimal control. The statement and solution are given 
in Okhotsimskij [1], Ehneev [1], and Fedorenko [1]. The control 


process is described by the following differential equations: 
dx1 dx? dx3 1 
dt =—UdU, dt = x5, dt =[Vu—Q (x) |r —8, 
Olean ha 
where x(t) is the mass of the rocket; x(t) is the altitude 
above the earth's surface (km); x3(t) is the rocket velocity 


(km/sec); u(t) is the mass flow rate (sect 


13M Vos 2 km) seceeis 
the gas nozzle yeloeiivs ¢ = @,Oil km/sec” is the acceleration 
due to gravity (assumed constant); Q(x) is the aerodynamic drag 
defined by the formula Q(x) = 0.05 exp(0.01x2)(x3)?, 

The initial state of the rocket is: xt(0) =o, x7(0) = 0, 
x°(0) = 0; at the terminal time T = 100 Secr 5 |) Une st imaieuya iuemod 
the mass should be 20% of the initial MASS lise, xt (pT) = Onn 
We have thus the box constraint 0 < u(t) < 0.04. It it required 
to find the mass flow rate at the rocket's peak altitude. Accord- 
ing to our formulation of the general optimal control problem, the 
SEUNG tr1@) rates) = -k,x°(T) and the terminal equality-type 
COMSEIeei bale aS) = ky [x (T) - 0.2] = 0. The scale factors are 


ky =" OL Oi) anid Ko = 10. 
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To solve this problem numerically, the segment [0,T] is di- 
vided into 100 equal parts (q = 101) and the system of different- 
ial equations is integrated by the Euler scheme with recalculation. 
The initial control itaiion by the function u)(t) = 0.008. The 
DISO system for solving optimal control problems calls for no spe- 
cial maneuvers to eliminate the box constraints. This helped 
avoid the harmful "sticking" of the control to the boundary (see 
Ehneev [1]). The problem was solved in a dialogue mode using a 
series of popular methods. The first iteration by the penalty 
PUNeEYOn Method OPTZE3S yielded? FECT) 0847411 Ty = 
133.642 km; the next three iterations by OPT553 yielded: 

Cr) = 0.200000614, SoD) = 1325133) km. =the optimal control Law 


is graphed in Figure 9. 


u(t) 


0.04 


0 50 100 t 


Figure 9 


The comparison of these results with those obtained by Ehneev 
[1] demonstrates a qualitative and quantitative consistency of 
optimal control laws. During ae sakes Silex Secousls was was flight, 
the mass flow rate is maximal, i.e., 0.04 pene then it drops 
drastically to 0.013/0.014 Beet t: remaining unchanged up to the 


42nd second or near. The flow rate then drops to zero and from 
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the 45th second to the end of the flight it does not change any 
more. According to Ehneev [1], this control law “possesses all 
the major well-known features of the optimal mass flow." 

Table 5 contains the velues of the functional and of the ter- 
minal mass x (T) of the computations using the DISO-system meth- 


ods, Ehneev's methods (I) [1], and Fedorenko's methods Gls [els 


: 
penal [> [= fw a 
Iterations 


The results obtained by the method OPT413 only are given in 


Tables 5 



















Table 6 and in Figure 10, with the curves of the initial control 
Ug (t) = 0.008 and the control u(t) after the first, second and 


fitth iterations. 


u(t) 





0 50 100 t 


Figure 10 


Taplew i containce the sresul tenor computations for discretiza-— 
Lion Moinis so =" 2015. de. the integration step is half the size. 
The total time of computation has increased to 420 seconds (by a 


HAYGIGOR® “Oe Al {3)). 
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Number of 1 
Iteration 
x“(T) 54,687 | 133.642 






0.20000 | 0.17471 | 0.18617 1 0.19333 | 0.19673 | 0.19838 


Table 7 


Method OPT413 OPT553 


Number of 
ee ae ee 


x?(T) 54.697 146 .519 131.684 SYR G7 LS Zr Gv 


0.20000 | 0.170345 0. 199996 





4, AN OPTIMAL TURNING PROBLEM FOR FLIGHT VEHICLES 


The dynamic model used is based on the following assumptions: aE 
flight vehicle is a material point moving in the three-dimensional 
right-handed coordinate system attached to the earth; the sidereal 
motion of the planet is ignored; the earth is assumed to be flat, 


the acceleration due to gravity is constant with respect to the 


altitude. The motion equations have the form 
: m2 ; 
oe = see cos (x?) cos (x®) ; x = ne Since) , 
oe cosa - C.qS 
ge = oe cos (x°) pence) 5 x4 = ¢ ————_, - ite) , 
x 
iO umes To ieos ae) - Conte) Ome Nec) 
i ee ee a eee eae Go Ae a Bani 
x x COS (es) 

CA 

x = = te j 
where oe x? , x? are the Cartesian coordinates of the vehicle, 


x7 is the altitude, is the velocity vector, “> is the flight 
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path angle, a is the course angle, ae is the vehicle mass, a 
is the’ thrust related tovmaximum thrust PJ u2 is the overload 
relative to the maximum overload N, ue is the brake force rela- 
tive to its maximum, u* is the bank angle, «is the angle.of 


attack, g¢ = 9.81 m/s”, S is the frontal area, “q “is \the dynamic 
pressure, Cy. is the aerodynamic drag coefficient, Cx is fuel 
consumption per second. 


We use the following relations: 


een en 
cas piles) Gx} 
2 ’ 
BCRP =) 398010 19¢ x2) >) = 01) nes sa0nex 2 a0onaes a 
> dO Loaf 250008 a x2 
Bay 12.5 
A(x) due bSA0b Boe OSs 10n x2 « 
= u2Nx ! 
CON ee en 
Ug Deo AnGG Ss 
bee 0-74 20.8 1a P 
s S000 a tas 
S = 55m? , 
_ .. fas 150000 
N min ite = a Oe Sie 
x x 
: 2 3 
C, Mie) 0.0284) 3.17407 S%0.03u 


Taking the approach suggested by Isaev and Sonin in [3], the 
function N(x) has been "smoothed svi 


In addition to the "box" constraints 
Cros 2al¢e) <7, Ondit en ayee aL Vn Gio) wel hehe dere 


the following "continuity" constraints are imposed: 


(447) 6.7. EXAMPLES OF NUMERICAL COMPUTATIONS 


“t 2 
du du 
[se OU. on S| SG Mage 
(7.3) 
3 4 
{sa erie joa < 1.57 rad/sec 
The initial state is: 
a (0b o=0 x (0) 0) 
x7(0) = 5000m , x7 (0)L.= 1300 mis ., 
A 1COpe= TOs tO 4 
7 yi 
FCO yee 20000) ke 


ee ee 2 il 4 
The initial values of the controls -u~(0)s= >-—77.,, 00 (0) — 0; 
guaranteeing horizontal flight of the vehicle at an instant of time 
t = 0, are not changed during the computations. It is required 
that the terminal values of these controls satisfy the horizontal 


fli bacOndast Ons mleCias, 


2 ab 4 p 
Wed) NCx(T)) ’ eG Ts Sf OF ee (7.4) 
The state coordinates of the vehicle at the end of the flight 
need to be 


x2(T) = 7000m_, ery a= OL, Sl) =e ee 


We may formulate the following minimal-time problem: find 
the controls u(t) satisfying the constraints Cle en lar) eerie 
bringing the vehicle in the shortest time from horizontal flight 
at the altitude of 5000m to horizontal flight at the altitude of 
7000m with the reversal of the velocity vector. Thus, time is 
taken as the functional R= T and the conditions G7 545) COGS) 
form a system of terminal equality-type constraints. Starting num- 


erical solution of the problem, the variable t has been changed 
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(see Section 1.8) and the differential equations have been inte- 


grated on [0,1] by the Euler scheme with constant Sse is) = 10,07) 


(q = 51). The initial approximation was: To = 21 sec and the 
finétions "ut eye; uo (ey Sholay ues Oe utp ees een [Fe]. 
Six different methods were used to solve the problem. In 


Figure 11 the graphs represent the functional T (in seconds) as 





Figure 11 


a function of the computational time {¢f (in minutes) on a BESM-6 


computer. The dots represent the values of T 


th 


at the end of each 


iteration, interpolated between each two UC TA trons am ate 
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numbers on the curves designate the following algorithms: 1 is 
the penalty method OPT41, 2 is’ the Lagrange multiplier method 
OPT7, 3 is the modified Lagrangian method OPT53, 4 is the sim- 
ple. iteration method OPTS5, .5 and’ 6. are OPT45. and OPT46,: The 
methods OPT7, OPT53 and OPT55 required the knowledge of the dual 
variables. Hence these three methods were applied only after one 
step made by the penalty method. Solution of the problem by each 
particular method was stopped if either two successive values of 


-4 
ne OF 


T differed from each other by a quantity smaller than 10 
the allocated computer time had run out. In Figure 11 we can see 
that the methods OPT53 and OPT55 have the highest convergence rate. 
The feasible accuracy of solution was reached in 25 minutes. Using 
the penalty method OPT41, it took almost an hour to obtain similar 
results. For OPT46 the accuracy of solving the auxiliary problem 


had to be raised versus OPT45, otherwise the value T after the 


second step exceeded the optimal value Ty, ~ 16.96 sec. 


Solving this problem requires considerably longer computer 
time than the preceding cases. The reason is that the discretiza- 
tion yields a nonlinear programming problem of finding the minimum 
in 201 variables satisfying five equality-type constraints and 702 
inequality-type constraints, with 302 box constraints taken into 
account int solving the auxiliary problem. The vector of dual var- 
iables had dimension 405. 

In conclusion, let us examine the optimal programs Ct) anid 
the overload ny (t) = u7(t)N(x(t)) by OPT55: they are shown in 
Figures 12, 13, 14, 15. The corresponding values of the controls 


and state variables are such that carey = 7001m, x°(T) = 0, 
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"y 
8 
6 
y 
a 
0 
Figure 12 rasure l3 
us, 
120° / 
ul 
60° 
us 
D4 8 Ca ei i el 
Figure 14 Ps ucces 185 
x (GUD mp eOn! enc e309 u"(T,)°N(x(T,)) -1 = 1074, 


The control ne) is of a simple form and ‘boundary! in the 
sense of the constraints. At the start of the af eaverlact: 
(Ove tS <9 3.39 sec) an grows from 0.133 to 1 with the 
maximum velocity (Cu x O25) 7 atethe end of the filaeont 
Glo Ose = Ty) u“(4) falls from 1 to 0.226 also with 
the maximum velocity le x -0225), -im the middie of the staal ate ett 
Can39eseicrs f= 13.90 sec) Ce) = 1. The graphic representation 
of the corresponding overloads n,(t) is more difficult. The rea- 
son is that before t x 7.45 sec. the function N(x) = 150000/x! 
and is practically constant GUNES Ge = AO) = 0.02) because of 
an insignificant change in weight (Ax" = 19940kg - 20000ke = -60kg). 
HO mets 7.45 sec the function N(x) is defined by the formula 


NGx) = 


> 
qs : : F 
_7 dor the first time and becomes essentially nonlinear. 
a 
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This occurs since both the velocity and the atmosphere density de- 
crease as the altitude increases. The corresponding fall in the 
dynamic pressure q results in qS < 150000, and the maximum 


overload has to be computed by a different formula. 


7 


5. THE PROBLEM OF OPTIMAL DISTRIBUTION OF A STRUCTURE 


Historically, most of the traditional problems of optimal control 
involved motion of moving objects, such as airplane, rockets, etc. 
As the methods of optimal control theory developed and improved, 
they were applied in other areas as well. As an example we consi- 
der the problem of designing a structure of minimum weight with- 
standing the design safe stress. Under some assumptions, the beha- 
vior of such structures is described by a system of differential 
equations instrumental to solution of optimization problems. These 
equations are due to G.I. Pshenichnov, who cooperated with this 
author in stating the solving the problem of optimal load distribu- 
tion. The variables determining the problem are examined in detail 
in Grachev and Evtushenko [5] and Kashin, Pshenichnov and Flerov 
fi} * Inthe: latter numerical results have been obtained by the 
methods described in this subsection. Hence optimization will be 
the focus of our discussion. | 

The state of the structure is described by the following 


system of equations: 


1. 3 
dx = eee Ox. es | 2 6 
“dt [x - 37] pate, dt pate 
2 3 5 a 
1 d 4 6.2 
eS [Er -K|x Se aa Br tS a ee (7.6) 
: 6 3 
Oxere a ee OX Fela, 5 x 
a = x es m(t) Cat ET “ O SG PRL 
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where all the quantities are dimensionless. The functions qiCuyy, 
p(t);) mt), K(t),-° R(t) are Specified. «The controls u(t) are 
parameters of the structural cross-section, and in the system (7.6) 
they appear as the area F(u) and theimomentyof inertia! 1())yof 
uhensiceiaton,. 


The boundary conditions are also Specified for the system (7.6): 


x (0) x'(0) = 2°(0) = 0, 


I 


(7.7) 
Ropes oe (1 ee ee eee Oo 


As control parameters we take the coordinates x10), x7(0), xu Ods, 
not specified for t = 0. We pertain the conditions GOs tomtexr— 
minal equality-type constraints. Thus, the solution of the boundary 
value problem enabling us to override the Static indefinability of 
the structure has been fitted into the general solution of optimal 
control problems. 

In Figure 16 the exterior loads are balanced forces for a cir- 
cular structure of radius R,- The dimensionless parameters of the 
loadvare: “ON: =40n551. aCe) = 4\sin (2at )',2| pt), 5) 0, ME) SOle seAt 
the points |} 445) 2). where ty =O to = 0.88, the massed loads 

N are projected onto the tangent 


| and the normal to the con tour: 


x(t 40) s x(t 5-0) + Nsin (2nt,), 


Il 


x" (+0) x(t -0) + Ncos (2nt,), 


Je = 18.2 





The rectangular cross-section 


Figure 16 


: : cle 
has dimension u Xu, where ue is 


the sectional width, ut is the vertical interval (in R,-direction), 
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The sectional area and moment of inertia are computed by the formu- 


las 


f(u) = ae ? I(u) = 





its sequired to tind the functions ute), a(t) determining 
7 


the contour of the structure so as to minimize the stress: 


1 
Regattas ale ot Ed ea Ges 
O 
satisfying the conditions 
tO ee Ce fore as & 


Furthermore, we impose the constraints on the safe stresses in each 


cross section: 


2 1 2 
ee ee ies Gt) fe ec) 
0 
o being the safe stress, Ep = LOs- For the functions defining 


the constraints to be differentiable, we replace the condition 
(7.8) by an equivalent system of four inequalities: 


5 uct) [u"(t)1"0 Seer 


TC ae ene OS eae) x(t) 
0 


CE.o)) 


Prorat ical? 6 = pclae 6 He = 


The integration interval of the system (7.6) was partitioned 
into 50 parts, which led to a nonlinear programming problem with 103 


variables, 200 inequality-type constraints and three equality-type 


constraints. AS an initial approximation we took: ug (t) = 10278 
ie - 10°77. In the first series of computations a linear version of 
the system (7.6) was used: 
; ae 2 
Be = Kaa -qd, ae = axe =e G Ses Xe eM 
CLO) 


ow 
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The integration of ‘thes systems’ (7.6)5 9¢72 10): by * thes Euler 
scheme has yielded large errors and unsatisfactory results. The 
Euler scheme with recalculation yielding an error of order o¢hY) 
made it possible to complete solving the problem. The computations 
started via the penalty function method OPT413, then the dual method 
OPT553 was used. Solving the nonlinear problem (7.6) and the linear 
probilem (67.10) Lede in fact, to the same answer. In Daieunater 
the corresponding values of the functional are equal to dn Bde Aga 
and a oto implying in turn that the design using the linear 
model does not lose stability under the specified stress. 

As follows from physical considerations, the sectional width 
does not increase and remains equal) to) thewangd tilale values. «Therdes 

all 


pendence of the optimal vertical interval u on t is shown in 


Figure 17. Each constraint f ‘is essential for some values of t. 





In Figure 18, one can see how rt and r4 behave along the optimal 
r OS ft 
ut g 1 
8.020 
“0.05 
OOS 
0.10 
0.010 
0 O45 ae 
Figure 17 Figure 18 


trajectory. From the diagram in Figure 17 we infer that the initial 
design was too lightweight for the specified stress, and the opti- 


mization raised the weight. At the same time, there are intervals 
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in which nea) = 107% and the vertical intervals can be even 
smaller, which would lighten the structure. However, the lower box 
constraint is essential in this case. In actual design, these con- 


straints are, as a rule’, prescribed by engineering considerations. 
The comparison of Figures 17 and 18 imadveaves that. tie in 

creased vertical interval us is due to reliability requirements. 
In terms of nonlinear programming the optimal solution is a boundary 
point of the feasible set given by both the box constraints and 

the constraints (7.9) along the trajectory. If there were no box 
constraints, then the maximum safe stress would be possible in each 
cross-section. This assertion corresponds to the "equirigidity" 
hypothesis, which indeed holds in the case considered. This ap- 
proach extends to designing structures whose contours can be non- 
closed or multiply connected, or can have cross-sections of a more 


complex configuration, for instance T- or I-shaped. 


8. AN APPLICATION TO DIFFERENTIAL GAMES 
1. THE STATEMENT OF THE PROBLEM 


In recent years, many articles and monographs treating differential 
games have been published. Extensive reference lists are given in 
Isaacs [1] and Krassovsky and Subbotin [1]. Not much work is, how- 
ever, available on numerical methods of solving game problems, be- 
cause of arduous computations involved in seeking global extrema of 
multivariable functions. 

Suppose the game is described by a system of differential equa- 


sRIR@ MSs 


dx 
ce £CxOU FE PaCS eT); 
dat Gs.) 


Smee TC ome GE new) tui). 


lA 
ct 
IA 
Gg 
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For simplicity, we assume that the interval T is fixed and then 


proceed to discrete approximation of this system by the Euler scheme: 


= m x x om 
Aug h f(x; ,u,;,t,) F(z;) , Zz. [x;,u,,t,] ; (eno) 


a e_ y vi 
he ie eee eo) Fn Zi Lys v4 ba |e C823) 
In the system, the constraints are mixed: 
Tate Te) SetO ry Vi) a= 20 LSA eSeG (8.4) 
are 7 2 Bo il i 2 a 6 ; i 


Here, for simplicity's sake, the equality-type constraints are 
omitted and no terminal constraints are Specitiied, 
The game is estimated by the function Be a? depending only 


on the terminal state coordinates. Suppose that the functions te 


‘i, 2 


g, [, f", R determining the problem are everywhere continuously 


differentiable in x. i? va We introduce the complete state 


1 z 


ae 


vectors xX, y and the complete control WECWOIES, wh finiel 72 


ZS Pee Roe = le 5 las i Wetec ae as 


q 


va [YysYgo-- VQ] ; Vi Lv goV ps seal 


The| vectors x, y, u, v need to be distinguished from the functions 
RO Ch ult) CL y Paneth e SVSUemumGsrela)m 
Having defined the vectors u and v, we can define aa ge 


and vq Vy6v); using the formulas of Section 6.1, we obtain 


dR LOS Ee OR errs oy 
du, Ha Py ay) Aas? dpe ea Voted 
ip Leeann x Vee 2 ee, 
ve) H (45 Py 44) ) De = HS Pa) ’ C335) 
5 
= R : Wes 
Pay me ge dul: Ca fe gee 


1S Ese 1 x xX Ziyi oa a Y. es y y 
H (25 P44) = (F(2h) ,p%,,) ; H (23 P3544) ae P(g py? 
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Here the vectors Pi and ih have the same dimensions as Xj and 
Vq> LesSpectimavicdiya. 
Problem I consists in finding the quantity V4 -- the best 


estimate of the composite function R(X, Vg) for a player whose 


behavior is described by the system (8.3): 


V4 = eee ey ROX, (a) yaw? ; (8.6) 
. Problem II consists in finding the best estimate for the 
opponent: 
V =m Meme nko Gl) a, Vv : Sia 
: dat HI BE CH) YY) (8,7) 


In both problems the extrema are sought under the constraints (8.4). 
These are referred to as problems of finding the maximin and 

minimax estimates in the class of programmed strategies. The 

strict meaning of (8.6) and (8.7) was explained in Section 1.5. 

For the problem (8.7), in particular, the solution of the “interior” 


problem determines the point-set mapping 


vCu)) = arg max R(x Cu), y.tv))- | (8.8) 
= q q 


the solution is next sought for the "exterior" problem of finding 


u, € arg min $(u) , o(u) = R(x,(u), Vg (vCu))) , (8.9) 
u 


Vo = o(Uy) 
According to the results of Section 1.5, among the solutions 


of the problems (8.6) and (8.7) we have the estimate V4 my) 


B . 
2. GLOBAL METHODS FOR SOLVING THE PROBLEMS 


The simplest method of finding the minimax Vo is based on minimi- 


zation methods in a space of reduced dimension (see Section Zao 
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For fixed u one has an interior problem and needs to define 
ve«v(u), that is a standard discrete optimal control problem. We 
assume that for each u the problem has a solution. Then we ar- 
rive at the problem of minimizing the composite function o(u) 
reducible to finding the sequence ~u converging to u,. If global 
methods are used to solve both interior and exterior problems, the 
MiWGl eM (3,7) Mes a ~@ilooail Solin, However, numerical implemen- 
tation requires enormous amounts of computations andy, snavumeaiiiy | 
only the simplest game problems can be solved on currently available 
computers, It is expedient to make simplifying assumptions, and 
this can help solve high-dimension problems, shrinking, however, 


the class of the problems. 


3, NUMERICAL METHODS FOR FINDING. A LOCAL MINIMAX 


We are to solve the interior and exterior problems. (8.8) and (8.9) 
using local methods, to obtain the local minimax. This approach 
requires a thorough analysis of the solutions obtained; one can do 
it only for an adequate initial approximation and a unique, contin- 
uous dependence v(u). The interior problem (8.8) is solved for 
each complete control vector UL. We assume that its solution 

a v(u, ) exists, as well as the dual vector Ay = ACU, ), and 
the necessary conditions of the maximum hold at the Kuhn-Tucker 


point [VA]: 


d 2 
ay PAG ¥g()) = 4T Or ,), Wy. ae) = 208, 
2 2 
PE) emer svi) Terms ie Sa 
where [= is othe gra coordinate of the vector re 


To solve the interior problem (8.8) one can apply any of the 
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methods described in ep cron Goo LUsualivesa combinatvon of, dit— 
ferent methods is needed: the first several steps by the penalty 
function method are followed by the dual methods. Thus, the values 
of the function (a) become known. The constrained minimization 
of o(u) is carried out by the same eo dee When the function 
d(u) is not differentiable, a version of the penalty method in- 
volving nondifferentiable penalty functions is, for example, em- 
ployed. 

To implement methods using derivatives, an assumption is in 
order concerning the differentiability of the functions v(u), Au), 
whieh wild hold, in, particular, swhen,..the .conditions of Theorem 1.47. 7 
are Satisfied. The differentiability of u(v), Cw) implies that 
Yq (vty) is digferentiable. The Systems (8.2) and (8.3) are ante— 
grated independently and are related only through the terminal 
functional R. Hence, instead of the functions vu), AC), Yq (a) 
we can use the dependence V(Xq); (x4), aes In order to take 


advantage of the formulas (8.5), we need to compute 


dR Cu axa, One poe, 
rane (Ay Vg) LF a acy one Sey) 
3 3 
q Ox oe X Yq 
We showed in Section 1.7.6. that the latter summand is zero.  There- 
fore 
R 
SADE OR(X, Vg) 
ox 
2 q 


Notes chia Geant ra depended explicitly on u, additional terms would 
then appear in this formula. 
This procedure is implementable for solving the problem CSeaar 


but it may involve elaborate computations. Hence it is more appro- 


priate to use the method (2.6.2) requiring no solution of the 
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interior problem. In that case, a transformation of the*controls 


u and v is made: 


Mu ee epee du, ; 
(eds MO) 
6 
H Pe ae OS) 
k+1 k dv, 
Here, Lor/simplicity, the constraints’(8t4) 9 0 <)e° <<?1) "have 


been omitted. 


4, NECESSARY MINIMAX CONDITIONS 


We have described local and global procedures for solving the pro- 
blem (8.7). An intermediate procedure is also possible, when the 
interior problem is solved through a global method and the exterior 
problem through local procedures. In that case, one can use pro- 
perties Similar to the discrete minimum principle of Section 6.4. 
We discuss this problem briefly, ignoring the constraints (8.4). 
Let the control process be described by the difference rela-— 
tions (8.2) and (8.3). We stipulate the constraints * SMW 
J eS[isq], ‘on controls? vy e.Vw Met us fico am the components 


except u; of the feasible complete control vector u. Further-— 


more, we consider the problem of finding 


Wipy min max wh Gxon Cw )en ye Gv) ie (Sebi) 
u,; 6U veVv qd a 


In the interior problem this determines 


VCU) =" sare max eR(e Cu), ¥ (CY). C8e12)) 
vev q q 


In the exterior problem we have the set of solutions 


W, = arg .min @(a@) |, ¢(u) = BAS) Vg lv(u))) 


u. €U 
a, 
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We assume that the following conditions are satisfied: 
el. the vector function F(x; ,u,,t,) and the function 


BS pia? are continuously differentiable with respect to the com- 


ponents of the vector x; 


, 


e2. for each ae U the interior problem has a solution; 
the condition (8.12) defines the point-set mapping v(u,; ) asso- 
ciated with the set of terminal Cu; ) points Vg Cuy). 


Given the feasible vector a U, we can uniquely determine 


a from (8.2); by solving the interior problem we determine 


UG tae The problem (8.11) becomes equivalent to the following 
problem: 
V = man dels IOs bla Ne AV) a C313) 
2 Gri SLs ead 
u, €U Va 4 
We fix the point u, « U and the point y, « Yq (uy): Then ~ 


q 
uSing the formulas (8.5), we successively determine the vectors 


Ke ex eG p a 
Py? Paste ert Peay depending on Sade F(x; ,u,,t;) and on Vq 
parametrically. In particular, we obtain 
ahi (Gar, 
oe Gyep¥g) = geear ik 
ag i+t 


Just as in Section 6.4, in solving the exterior problem we go 
over to the state space. Then the problem (8.13) is equivalent to 


the following problem: 


V = min max Rewaavee) 
2 Gd 
X54 Ev HEV g (Uy) 
Here the set % = F(x, ,U,t;) is convex and compact. By Theorem 
= * * i i 
DEO se O mal avic xR F(x; ,u*,t,), Ng as Wop tt 2S" necessary 


that the condition 


x 
nes ey mere nee 
Y EV q (uF) 
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be Satisfied for any X541 ° 2, or passing to the control space, 


WEL Clwesig qo, sere shill UL ie U we have the inequality 


x * 2) = Pee =" 10 

a, ba GE a (me, U, te) Ce, Us te)? 
qa°qei ° (8.14) 

We have thus arrived at the following theorem, known as the 
discrete minimax principle. 
THEOREM 626.1. © For the systems (8.2) and. (8.3) let conditions. 1 
anda cmenoldean delete riemse cam = F(x, ,U,t;) be convex and compact. 
then, an order that ux = W,/11t is necessary and, if o(u; ) is 
convex, also sufficient that the condition (8.14) holds for any 
Us & We 
i 

The following assertion is an analog of the linearized minimum 


OTE Ca pls Seen hOdeura ny: u; ¢ U we have the inequality 


x 
* * - uk 
max (P(x, Ud t, Dey (FCx, qots)> Viger u. us) SO) 


Fl 
y ©€y U3 


These results make it possible to use the versions of the con- 
ditional gradient method and of the gradient projection method, as 
indicated in Section 6.57 “The present problem is, however, more 
complex. In the preceding case, Pi4q, was determined by the rela- 
tions (1.5) and (1.7), whereas now, for the interior problem to 
have a nonunique solution, one needs to consider the set of vectors 
ee defined by (8.5) for different values of Yq 


Taking this approach, it is easy to take the state Space con- 


straints into account and obtain the necessary minimax conditions. 
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5. AN EXAMPLE OF NUMERICAL COMPUTATIONS 


These methods are extendable to the case where the equations (8.1) 
are not separate and there are no state Space constraints. For 
illustration, we have taken from Isaacs [1] the dolichobrachisto- 


chrone problem, The game is described by the system 


Ox lam BY, 
at = y cos u a Do. ’ 
va ‘ v-1 
at = VY Sin uu + ere , 


where u and v are controls satisfying the constraints: 
OR SusSe 2 eee — ev ie 

A player controlling the funetion u tries to bring the state 
vectorwon the set 


Men) ext, vio Onat yie>. On: 
er 


in minimal time. His opponent handling the function v tries to 
prevent the entry on M or at least to delay it. Numerical solu- 
tion of this problem begins by the method (8.10) and continues by 
the conditional gradient method, as indicated in the preceding 
subsection. 

The results are diagrammed in Figure 19 as state trajectories 
on the plane [x,y]. The breakpoints of the trajectory lie on the 
so-called switching parabola, at which the second player -- "oppo- 
nent!) —— switehes his control from -—! to -+1. It is’ worthwhile 
to compare the diagram in Figure 19 with the similar diagram in 
PUKE r Once Saas ie where the author seeks the solution 
of the game in closed-loop strategies. The state trajectories in 
the area above the switching parabola exactly correspond to the 


analytic solution obtained by Isaacs. However, the state 
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> 


ay 
T 





Figure 19 


trajectories below the parabola have different form. Chaveorinctae [eal 
points out the errors in Isaacs' qualitative description of the 
trajectory field. However, Chigir' has drawn incorrectly the 
state trajectories close to the set M, which should rather be 
perpendicular to the y-axis. One can easily see this from the 


analytic formulas and numerical results graphed in Figure 19. 


Chapter 7 


SEARCH FOR GLOBAL’SOLUTIONS 


In the majority of practical optimization problems, it is required 
to determine global solutions, and only in rare cases do local 
solutions suffice. Minimization of a convex function, for example, 
on a convex set can be solved using local methods since’'in this 
case the local minimum coincides with the global one. Numerical 
methods for seeking global solutions of multivariable problems, S 
in spite of their practical importance, have been rather poorly 
developed. _ This is no doubt due to their exceedingly great com- 
plexity. We shall. not detail all the available approaches to this 
problem. Instead, we shall concentrate on one most promising 
direction -- which is based on the idea of a non-uniform covering 
of a feasible set. This approach has turned out to be quite uni- 
versal and, as we shall show, can be used not only for seeking 
global extrema of multivariable functions but also for nonlinear 
programming problems, for solving systems of equations and, most 
importantly, for multicriteria optimization. Problems which are 
solvable in reasonable computer time must be of limited dimension 
Cot order 10 to 20); however, the use of multiprocessors, parallel 


computing and distributed processing substantially increase the 


possibilities of this approach. 
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The development of global methods calls for a new view on 
the numerical methods. Not only should these techniques yield the 
global solution; but it must also be verified that the solution is 
actually global yathiis requirement is. crucial in problems of opera- 
tions research and game theory, where the so-called guaranteed 
maximin and minimax estimates are required. 

These demands on numerical methods compel us to reevaluate 
the available approaches to optimization problems. The local 
numerical methods described in the preceding chapters, such as the 
penalty function method, or parametrization method, are hardly 
suitable for finding the global minimum directly (for more detail, 
see oections. 3.477 4.3, 6.2). However, utilization of local op= 
timization methods as auxiliary aids to the basic method of global 


search substantially improves its efficiency. 


1. THE GENERAL NOTION OF COVERINGS 


le Wale SA VEMIEN TE (Ole Tirta IROVILISIY) 


We examine first the problem of finding the global minimum of a 


multivariable funetion  “£(x)- / on) a’ feasible: ‘compact’ set .\Xivve R': 


fo oy ML Cx ee (Cabs 203) 
xeX 


In numerical computations this problem is usually Simplified: 


one introduces the so-called set of e-approximate solutions: 


oe es BLK PGC) ) tet fi git db. Cie s2) 
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Herne eres O0 Vast the specified accuracy of the ‘computation, “lt as 
required to find some point Xx, ='X.. In other words, it is neces- 
sary to find with the given accuracy the global minimum value of 


a function in n-variables and at leas{} one point x at which 


* 
the approximate value is reached. Let us delineate a few numeri- 


cal methods for solving this problem, 


2, RANDOM SEARCH 


We randomly pick k feasible points Xp Xr eee sXe As a solution 
we take the point at which the minimal value f(x; ) Swett erclle 
i e« [1:k]. The number of test points k is determined in such a 
way that the probability that at least one of them belongs to x. 


is sufficiently large. Obviously, the method is most effective 


2 
WhenimeLt (x) aaiS ms Uteoicrent iva, Willa i! Vand ariesOmtheurabhlouol ethic 
measure of the set xX. TOmEhatron EX eeUSenOuesmalin 
3. RANDOM SEARCH USING LOCAL PROCEDURES 
As in the case above, we again define feasible points. Then, 
using local methods for finding the minimum of a multivariable 
function, we look for “the Tocal minima’of f(x) on &, taking 
Xp Xo,ee+,X, as initial poinus. As a result, we get the peints 
Ba eee ee as the solution x, we take the point at which 
f(x; ) has the minimal value. The number of points k necessary 


for the implementation of the method is determined by the condi- 
tion of guaranteeing a sufficiently large probability that among 
the initial points Ky oXoree +s Xp there is at least one point 


hitting in the region of attraction (the region of local conver- 
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gence) of at least one point of the set of global solutions of 

the problem (1.1). It is appropriate to use this method when the 
number of local minima in the problem is not great. The method 
has two bad features: first, it, does not guarantee finding a 
global extremum, rather it only gives a probability of the event, 
and achieving a given probability usually requires a lot of calcu- 
lations. Second, in using the method a situation can occur when 
the local minimization methods will repeatedly seek the same local 
minima, which unreasonably increases the number of calculations. 
To devise more effective algorithms, more stringent conditions 


need to be imposed on the function to be minimized. 


4, THE METHOD OF COVERING 


We present the method in stages. First we give the general idea 
of the method and a little later give a more specific description 
@st WG, 

Let the values of f(x) be computed at a sequence of feasi- 


ble points {x,} = [X],%5,-.-,%,]. Define the quantity 


R. min [f(x,), f(x»), eee f(x, )] (ale 3) 


and call it a record; "we call any points X, € {x,} Satisfying 


R. = £(x;) a record point. Define the set 


Zo {x eR": Be te (1.4) 
Obviously, 
Ri. =e.) SS tm ean oe) 
xeZy 


The wpormts on 4. acre Voi no anverest tour finding the global mini- 
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mum Since the exact global minimum of f on Zi. can improve the 


value of the record R. Dye nOemore silane se. ence unes ser 4. 


can be omitted from consideration and it is sensible to continue 


the search of the minimum only on the set X\Z. Imp pains acura 


sa 

1 he Zi. (Gasp) 
then the initial problem is solved and the record point x, is 
taken aS an approximate solution; it is suaranteed that x, ,<] X. 


k 


Thus the problem of finding the global extremum has been re- 
duced to constructing a sequence of the points {x, } satisfying 


(1.5). The sequence Ry is monotonic decreasing, the sequence 


of sets Zi, is monotonic increasing, i.e., 


Read = Ry, , Tae : Zed ; (1.6) 


50 LOCAL METHODS 


The set Zs essentially depends on the value of the record R, 
and it is greatest for Ry viz ~@the value tia being usually not 
known. Hence, to extend the set Zi. it is desirable that the 
record be as close as possible to’ f,. The sequence {x,} will 
be chosen as to guarantee the condition (1.5), while to extend 

the set Zs we use the auxiliary procedures of finding the local 


minimum in the problem (1.1). For cers XGeereuTGl 


ce Rome eas CRT) 


one goes to the program of local search of the minimum; i mOnie 


thereby obtains a point X, at which f(x, ) < f(x, ), then as 
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R; one takes the quantity £(x,). This technique substantially 
expedites the computations, and will be used in the sequel. The 


conditions (1.6) will be conserved in this case. 


@ 
6. FUNCTIONS SATISFYING A LIPSCHITZ CONDITION 


In numerical computations, finding 4. TOM La4)s lace Te 
becomes, however, easier under the assumption that f(x) satis- 
fies a Lipschitz condition on X with constant Ue ea cee et or 


any <a) yo — ox omer has 


bECx oe Ley ers 2 ihsees yale, (irs) 
Then 
fy ela |e, tae (9) 
yielding that the inequality 
at eS ES Ge) G0) 


will ibe Satisfied for all x satisfying 


Hy he EUR! EOS) 04) x |) ae (16 19 


Let x4 belong to the sequence {x,}, Jj = ki Introduce the 


ball ae and the enveloping sphere §S 


gk 
—_— n, 
ae = axe Ro: I|x- x, |] r5i3 
Ca a) 
— 1 = = 
S54 = Ax eens tse x || riyS 
The center of Bay US at the point eae the radius sot ae is 


Com poems 
e k 
SV ae : Git 30 
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AE) YT a Qe reps) a (ere) ee Gel OD a ee O le kOWo a Gincrten (lle Op mOuluc cmc oom cles 


x € Baie: The set 4). CONTAINS the suinwen ofa llesuch balls: 


k 


ae ae 
j-1 JX 5 


Therefore the condition (1.5) is satisfied automatically if 
oe ; (1.14) 


This representation suggests a constructive method for solving 
the problem. In broad terms, the method consists in the following. 
Suppose that for some sequence (x, } the record R, is determined 


from (1.3). Let the sequence of points Ee and the radii ee 
# 


of the sphere be stored in computer memory. If at the new point 


xX the quantity f(x, 44) < Re» we set R £(x and 


Rd k+1 >. k+1) 


replace the entire sequence vik by the sequence Pack+i): Mate 


the sum of the balls covers the set X, the computa-— 


Be Ckel) 
tions end; otherwise, we take a new point X40 ehael fee) fen,  Iave 
set X is covered by balls of different radii. The shortest 

€ 


radius is at the points 5 at which R, = FAD with Tak <7 


If xX is bounded, such a covering is accomplished in a finite 
number of steps. 

Without changing the formulas above, one can replace the 
COwereaiaye joy [OLIN lon fy covering of n-dimensional cubes whose 
faces are parallel to the coordinate planes. To this end, it 
suffices to assume that the Chebyshev norm is used in the Lip- 


schitz condition (1.8) (see Appendix II). Then the "ball" Erk 
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is a collection of points x Satisfying 


aie Xe ke ree bs C115) 
se[iin] J J 
6 
Le. 8 Be turns into an n-dimensional cube. If the norm in 


jk 
(1.8) is not Chebyshev, from the equivalence of norms (See Appen- 


dix II) one can redefine the Lipschitz constant 2% and use the 
Chebyshev norm. For example, if the Euclidean norm is in (LS) 
then in (1.15) we need to take Yn instead of 2. 

This approach to solving the problem is not complete since 
it, is nob clear how to obtain .a,covering. of X Siaiibele Sid cia Came Gelpepla-ey)) pe 
We shall discuss this question in Section 7.3 and show that under 
very general assumptions the problem can be reduced to that of 


covering a parallelepiped containing the set X. 


7, FUNCTIONS WHOSE GRADIENT SATISFIES A LIPSCHITZ CONDITION 


puppose the function f :is,differentiable in X and OHO YOY 9% 


and y of the convex compact set X we have 


ht Ce) - f(y) || < M||x-yl] , Cie.) 


where M is constant. We will find estimates for functions of 
this class. It is easy to show that (i516) implies that F(x) 
and |/f,(x)|| are continuous and bounded on X. We use the New- 


ton-Leibniz formula (see Appendix INE 
all 
Ee ' + - = 
f (x) f(x.) J (f(a, +0 Cx x,)), x x5) dt 


Adding and subtracting from the right side the scalar product 
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(TOK. Dp x-x +) and using (1.16), we obtain 


£(x5) + (£,(%,), X-Xx5) = pllx- x, lI? < f(x) 
By the Cauchy formula we hive , 
fey tye alla ix =x Ne iogGa) 





Using these inequalities, we obtain that (1.10) is satisfied 


for all x satisfying either the inequality 


M 2 

5 lls - x5 II - (£055), X-X;) < f(x,) peer CSuL7) 
or the inequality 

Mx x, ||? Sue Cx ee) || ka eel at ay re CL2 8) 

2 j a ‘pihea a ike . 














- 
The boundary of the set (1.17) is the n-dimensional sphere S3x 


centered at x4 with radius r, given by 


2 1 
riot se Jepiltdce.) 2.4 2mCiGe, he <IRE)IE 
at i x 6%; j ‘ 


The boundary of the set (1.18) is the n-dimensional sphere on 


centered at x with radius Pax equal to 


3 ule 
P jk . tik . qilt, (3; ll 


The shortest distance from x, to the sphere Six is P ie: 
Hence the sphere S 5k is inscribed in the sphere Sak? and the 


points in the ball (1.18) are in the dove eal Coleen) i 


Let .B, and B. denote the set of points in the balls 


jk jk 
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GIRL) andiGledis i" respectively Themunion of athenba lls Bix 
lies in the set A. defined by (1.4). A numerical method of 
solving the problem consists in determining the sequence {x,} 


satisfying the condition (1 71452 


2. COVERING A PARALLELEPIPED 


1, THE STATEMENT OF THE PROBLEM 


We consider a particular case of the problem (1.1) of finding 


Dy Ce G25) 
x eP 


where P is an n-dimensional parallelepiped 
Dr (OS Rt al ek <i) eens C222) 


The set of approximate solutions in the problem (2.1) is given by 
the formula (1.2), where P is taken as xX. The function 1) 
satisfies the Lipschitz condition (1.8) with Euclidean norm. Then 


the cube inscribed in the sphere ook is defined as follows: 


We. Sel x cae eee er. x é r. 
. j jk j jk 


jk Sanit C2539 


where 


Te a Oemeect (2.4) 


= 
sl 
Il 
2 
5) 
oO 
lI 
<1 (M 


e €.R ie the vector with all coordinates equal to 1. 
The center of the cube Vek US ait. ax 


the lateral faces are parallel to the coordinate 


a? the length of the 
edge is AT ii? 


axes. According to the arguments in the preceding Secuion. ct 
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— and oe are known, then one can exclude the cube V 


a3 
from the parallelepiped P. If it happens that Ri < By OW 
some k > j, then one can omit the bigger cube Y si containing 
the cube V55° From (2.4) it is not hard to derive a formula for 


recomputing the half-length of the face of the new enlarged cube. 


: = (od 


ps Ff eae) nee ee 
jk ak 5 J (2.5) 


The problem (2.1) will have a solution if the sequence of cubes 


Vix completely covers the parallelepiped P, i.e., 
k 
Pee Wh AYE : (256) 
oat 


A great variety of such coverings are possible. We shall 
describe one which had to satisfy two conditions: the smallest 
possible computer memory and the simplest possible computer pro- 
gram should be used. The first condition was due to the fact 
that the computations were run on a computer with a relatively 
small core memory. The second condition could be met because 
ALGOL-60 in which the program was written allows recursive proced- 
ures. Relaxing these conditions would undoubtedly permit develop- 


ing more sophisticated ways of covering. 


2. THE METHOD OF COVERING 


This author developed several variants of the method of covering; 
he was also fully acquainted with other authors' methods. The 


simplest and free from ambiguity method was chosen in preference 
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to other methods. We shall now describe it, following the prog- 
ram's implementation as close as possible. 

The method is implemented through a recursive procedure (we 
used ALGOL-60 terminology), i.e., with a recursive call of the 
basic program. The procedure is referred to as NiGinay Opis 
where i < [1:n], i being the index of the coordinate of the 
vector x, which changes in the process of the COVier It > Ue 
vectors a and b in the primal version of the method coinciding 
with the corresponding vectors GLE WC icTilaeta bier ie oem cn (022) 2) em Wes me ye 


n j 
First we 


troduce now the auxiliary variable vector woes R 
take all the components of V4 equal.toy u, where u is any 


sufficiently large number (u > max [bees Te Let some ini- 
ie[1:n] 


tial point X, € BP betkuownethate1s= thesana tial approximation of 


solving’ the problem (271) "Let 


ee en 


oy 
Il 


y= tne [a RG Be OT 


We construct a sequence of points XgsXgr-++ ,X, LrOnmeD si Satids— 
fying the covering condition (2.6) using a recursive procedure, 


which we now describe: 





THE PROCEDURE N(i,a,b). At input time the vectors Xe» Vin2 the 
record Ro and the value ICE are known. 
Suppose 
2 ; 
ew Diet Deae (2.7) 
Then we make the following operations. eet xe > Dye Wweuset 
<= bs First we consider the case i = 1. 
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We compute £(x,); R.. Citi turns mOud stile t 


Re eR a, (2.83 


we take Re as the record. 
Leis ne addon 


C2R25) 


where 6 is some given number, we turn to the problem of finding 
themLocaleamandmuneaneC2. 41). ASA GeSulite wer Seite apo Int Kes, 
where f(x) < R,- We take x as the record point, setting 

Re = f(x) and keeping the point X.- The quantity 6 measures 


the accuracy of the local minimum. 


lf Oe ease Ome Or TNE CoOinchiizsuoms (Ags) Om CAs) Ole, wre 


redefine er In accordance with (2.5) we take 2 
j Roa Re 
eo C2sLOD) 
= Q 
.th : 
as the j coordinate of Vee 


We define 


IMGs aa" ak 
ha ee (2.11) 
= Q 
Set 
1 ibe alt 
oe eat le we 
all ms : ile 
Va emia [v.; h,] (22S) 
All components of X41 and aca e except the first ones that 


have been redefined here, are the same as the corresponding com- 


ponents of Xo and Vege 
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When i = 1 the control in this place is transferred to the 
beginning of the procedure, where the condition (2.7) is checked. 
The process is ‘continued until (257) as violated, and then jthe 
procedure) NC, a,b) Stops cee We take Xe to be the one 
for which (2.7) was satisfied for the last time, in spite of the 
the value 


haere wut nat: eX WES! (lewis) lon (2. Ue, wsiuhe Bye bs 


st+1 st1 


of f was not calculated, so it may be viewed only as a test 
point. 

Now consider the case i > di. 

We call the procedure N({i-1, a, b). Upon completion, we 


obtain the vectors op My and the set 


vi = Simin [vit vi] 
yet dpe icee ay) 
@2513)) 
xt ss ek 2 yi-t 
j+1 Sie ites! 
The remaining components of the vectors Bad and a are the 
same as the corresponding components of ee and Vas After this 


the control is transferred to the beginning of the procedure 
N(i,a,b), where (2.7) is checked. The process is continued until 
G24) ae violated. Theng N(i,a,.b) Stops computing. 

It has been assumed throughout that the current points x 
and v, the record point, and the record are global variables, 
i.e., any changes in them in the process of running NiGiayeintop) 
anesiretileeted in all procedures’ N(s,a,b) “Gi,s e (Sr i) 

The computations are initiated by calling N(n,a,b). Asa 
result, a sequence of points Satisfying the “condition C2 Gms 
produced. 


This concludes the formal description of the procedure N. 
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We indicate next an lllustratnion of how at works first for avone— 


ViacLaple: LunctLon. 


3. THE ONE-DIMENSIONAL CASE | 


yr 


Let n ©] 4" in) problem*( 231): . Thevcibe (2.3) in this: Case! is ‘the 


segment 
5 diy 
et = {xeR: x, -T 5, SRKS x, +4r5,) ; 
f(x.) - R. 
where Pik =pt — For simplicity, we omit the step 
of using local auxiliary methods, and from (2.10) - (2.12) we 
obtain the following formulas: 
Xo = a “SPs Rg = min [Ro4> f(x.) ] ; 
De Sean tit ka (2.149 
tomer ere e take v_ + foi 7 Bs BS ae Ff  OwlaMeAyase Vy 
s Seite ey Ss Q So s 
does not change. Next we have 
See 6s he, : 
C2R slop) 
Vou, = Min [V9 hoa 
The method stops computing as soon as_ k is found such that 
x = DF Do ea ed C@2%elG6)) 


Thus, this method produces a monotonic increasing sequence 
Of points {x,} With varLable step he, which is distinct from 


Cee This is because after computing the value of f at Ke the 


next calculation of f will be made at x where automatical- 


Sari? 
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ly LSet (S+1) 2 p. Hence we can omit at least the segments 


ix -x| s 3 : Seg 5) ie 


Ss ss 


We combine both segments and arrive at (2.15). Hence’ noting the 
subsequent calculations, the step he can be made p units big- 
ger than i ge Here, however, we need to specially check the 
condition for ending the process. From the sequence resulting 
from the formulas (2.14) - (2.16) one can form a sequence of seg- 
ments [x,=D, ome el i ¢ [2:k] completely covering the seg- 
ment [a,b]. Hence for each point x « fa,b]° we can find an 
Xe such that 


Xone aes) Ne <a << x, + h; alo 


We use the inequality (1.9) and determine Cle 3)e and we obtaan 


Ge we i(x,) = t[x- x, | Saas, eee C2 1-70) 


Since x.is arbitrary in the segment [a,b], we reach the conclu- 
sion that the inequality (2.17) is valid everywhere on. [a,b]. 
Taking the minimum in x on the left side of the inequality, we 
get that f, = R. = €. Hence the method guarantees that on [a,b] 
in a finite number of steps the global minimum of the arbitrary 
HULU GAO (GS) satisfying the Lipschitz condition GIS) a waren be 
found within an error e. 

The step he turns out to be minimal and SCWaAl wo Bo wre 
points where NO) = Re and maximal where f(x.) >> R.. Thais 
has a Simple intuitive interpretation: at points where the values 


of f(x) are considerably larger than the current value of the 


record, we can vary x with a big step, without being apprehens- 
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ive of missing the heaved points. Conversely, we have to vary 
x with caution near the points where f(x.) and va are close, 
so that we do not miss the points at which (2.17) is violated. 
This method is frequently called an exhaustive search on a non- 
uniform mesh. One of the most adverse cases for this method is 
the case of a constant fuhction f(x); here the step of varying 
x is constant and equal to 2p. The method turns into a full- 
scale search with constant step; the number of steps required for 


the computations is roughly: 


(b-a)2% 


Sie De 


In the most favorable case the function being minimized has 
Teo Cy af L(&- Coy), where Cy, Co are arbre ryes Ca lars mss Lhe 


number of necessary steps to solve the problem is © 


Ko x 1085 Ky 


For an arbitrary function satisfying (1.8) the necessary number 
of steps is contained in the interval [K, Ky]. 


In getting the estimate K no assumptions were made about 


1? 
whether methods of local search were used in the counting process, 
and no additional information about the properties of the function 
being minimized was used. Moreover, it should be kept in mind 
that an almost constant function has a Lipschitz constant close 
to zero so that the number of computations Tor fLumetions Close vo 
constant is not significant. 

The quantity Ve in the given one-dimensional case has no 


influence on the performance of the algorithm. It is” required, 
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however, in the multidimensional case. At the end of the compu- 
tations, we obtain some value Vie equal to the minimal radius of 


S S 


3k? °° eK 


4. THE MULTIDIMENSIONAL CASE 


the spheres Soy 


let n= 2. Now the set P is a’ rectangle and the set Vak is 


ansquare, | Hirst we seu 
So = FORT Pe =, 
a 1. 2 
es a [x5 Xo | , 
Vous b [uu] 
2 2 : 
We Gail N@Aseql))q We Xo < Dt, eaten ON Glos. bikcts 
CHILE, lin Hae jorocess Ort running this procedure, the second 
2 2 


components x’, v of the vectors x, v do not change, whereas 
for the first components we obtain from C2 a ndianG2 lo) mre 
sequences ee a er analogous to those considered 
in the preceding subsection. If condition @2.8) is Satistied for 
some Ss, we redefine the ee component of v_ by (2.10), 


j =1,2. The last point k is found from the COmelaiesterm (Ae ie), 


Usane (2713), we let 


2 2 2 


Seed? @ dad Mkise 
2 S E 1 2 
Vi een [v,, vi J 
2 2 , 
alieig Keay <b +p, then a new call is made to N(l,a,b), and so 


on. The process stops when m is found such that 
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2 2) 2 

Sens Do ep =< ae 
Let us comment on this stage. Consider the sequence Kap. WEE 
Obtained asevaeresultmor thesrirstepass Usine UNG as byes intro 
duce the rectangle 7 

W = {x eR: athe = he ao 2 ns a® +vi- ph 
Let x e W. Then we can find an Xj such that 
al ak i x A = 
Sere D Sie xX Sj ty hj eh eck + rae 
C2 on) 
2 2 ilk Fee 
a eS ZENG Pip PS Ove 


The set of points defined by these inequalities is contained in 


the cube Vara NOME ne MSIarAe C|UL cyl iiity Van (XG) een nor a MLC ime @ Kromet 


k 
the points satisfying (2.18): Since x is arbitrary in W, we 
reach the conclusion that min f(x) 2 R, —€, Jtheretore the seu 
xeW 
W can be omitted from further consideration. This can be done by 


changing the parallelepiped P in CAs ty, Ibe 


P, = {x eR?: a ee be a? +vy-p 7 b?} 


and repeat the above pombutetions on Pi: Arouine ust as .ine phic 
preceding subsection one can show that in determining W _ the in- 
equality x? < a +ve-p may be relaxed by setting x? < arty. 
The two cases n mys n =)2) "clarity the idearor covering 
the parallelepiped P. Hence we will not describe the covering 
process in more detail since all the details merely follow from 


the procedure UN. 


The numerical computations noticeably speed up if a good 
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initial value xy is known and the value £(x,) is close to Te 
Hence, betorercaliing Ne Gediseuser dle Lomi icuatemleac tae rough 


estimate of f To do this, one can either use the random 


4° 
search method or call the procedure N, taking a reduced Lipschitz 
e 
Goose 5 Ce MMNereESe  E- 
If the value of the Lipschitz constant % is a priori un- 
known, one can start with some value Lo to solve the problem 
with Lipschitz constants | 22 Aho and so on, till the result 


0° 
(the value f(x, )) is not different from the preceding one by 
more than e. This criterion is a necessary but not sufficient 
condition for the value of the Lipschitz constant obtained to be 
greater than or equal to the true value of the Lipschitz constant. 

Experience with numerical computations shows that the use of 
local methods speeds up the computations essentially. This ap- 
proach carries over to the case of functions Satisfying (1.16), 
examined in the preceding section. 

Here we end the description of the method for Stolk wales (Ais il) - 
which we shall refer to as the primal method in the sequels This 
method can be refined by removing the constraints introduced above, 
with respect to the computer memory and the simplicury ot  theepro- 
gram used. Various modifications of this method have been devised, 
and we shall mention some of them next. 

Let tis first point out the simplest modification connected 
with the possibility of covering P by parallelepipeds instead 
of by cubes. We illustrate the idea using a two-dimensional case 
aS an example. Let the procedure N(1,a,b) run and let some 


2 5 ‘ 
value Vala > 2p be Known. “Let £(x,) and Ee be determined 
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2 Uae 


2 Ge : 
Vice senips Then we shall inscribe’a rectangle in the circle S 


and suppose that r is larger than the difference 


Su ss 


ss 


instead of a square (see (1.12)), such that one side is equal to 


ovat and the other is equal to 


x 


Re V 2 2 2 
20 = 2 a8 = (Vou be 


Ss ss 


2 : = 
From the fact that r.. > Cu She By VOI follows; that Piste: 
Hence one can omit all the points of the rectangle 
ih 1 il 

hi eee SP etades ’ 

2 2 2 2 2 

Bah ee TPE Sey hes Sop oP 
from further analysis.” In’°(2¢12) one can take ne =p t Bas 
which is bigger than the step given by (2.11). 7 


This modification makes it possible to increase the search 
interval for the first coordinate, without decreasing the size 
of the current interval for the second coordinate. This idea 
carries over to the multidimensional case. Let us now turn to a 


more radical modification of the method. 


5. A FIRST MODIFICATION 


Numerical computations show that the effectiveness of the method 
is decreased appreciably as the dimension of the vector x is 
increased. Especially laborious were problems in which the global 
minimum is attained on es “olaveau.’ In this method, the com-— 
ponents of v decrease as the number of coordinates increases, 
and for multidimensional problems the search in the last coordi- 


nates often has to be done with minimal step equal to 2p. To 
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remove this defect at least partially, the following modification 
is offered. 

We modify the method so that the minimal step-size for the 
second, third and last coorginates is always not less than Z2pOr 
1s Cue @ 220d, 1. = le2sml,,  wiere sd) do ra given number. For 
this we proceed as follows. At some point X, Suppose we find 


that he < 2pq. We define the coordinates of the vectors 


a, b <« R” by the formulas 
-i i 
a =o Ola 
b+ = min po x © o5q] . ise, Lin] 
We call N(n,a,b).¢ Assasresult, the global minimum of f(x) on 
the parallelepiped 
pwse ix Roo tate x= pb) 
will be determined with error ¢. In the process of solving this 


auxiliary problem, a new record point might be found, which will 


then be taken as R,: After solving this problem, we set 


1 er wd 
er ee Sa Ce aaa 
ee 
VAats ht epals -Oae) 


and the computations are carried further by the initial method. 
This recursive call of the basic program is simple to implement in 
ALGOL-60, and it is also easy to assure that in solving the 
auxiliary problem, the local methods seek the minimum of f(x) 


on P. 
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This modification substantially improves the operation of 
the method since it makes it possible to refine the step size for 
inspecting the set P locally, just near the,points of the mini- 


WGN Wit» ARCS), ' 

It is hard to make general recommendations for selecting q. 
Obviously, if q< 1 then no changes in the operation of the 
method are introduced, since the minimal step is always greater 
Ghanworsecdualmvom 2pe One the otneranande. 1) Ones Lakes ad) aso 
large that 2pq %v pee then there will still be no reduction 
in the volume of calculations. For moderate values of q we 


have been able to cut the volume of calculations by a factor of 


25tors, ina number of yproblemsr 


6. A SECOND MODIFICATION 7 


In analyzing the set (2.18), it was shown to be contained in the 
cube Wage Here is the one essential drawback of the method. 

The requirement for dense sequential packing of rectangles leads 
to the situation that the segment [xy !p, x; +h, -p] is dropped, 
although a larger segment [xy -h, +p, x; th, -p] could be 
omitted instead. Hence it is desirable to place the points x; 
inside the segment (a> not too closely at once. Of course, 
one needs to remember all the segments among which the search has 
to be continued and the completeness of the covering to be checked. 
We illustrate this with a one-dimensional example. Let us denote 
by [d,g]* the set of points ds x <g. In case g sd this 
segment is omitted from the sequences of segments defined below. 


First we take Xo = aS and calculate £( x5). Search for 
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the minimum is continued on the two segments 


(ae Xo - Too] ’ [Xo + Too; b] 
eae il Tea 
Let a =_#2 and calculate f(x). 


We continue the search on the three segments 


[IX3+¥%33, X9- Tol, [Xo tog, b] 


Upon computing the value of f at X43 the midpoint of the 


last segment, we obtain a set of four segments 


ie ee iG [X3*T34> Xg-Yoql » 
[Xo +Toq, Xg-Tgql » [x4 +444, P] 
We choose the largest segment and’ take its midpoint as Xe. iets 


in the process of these computations the quantity R changes, 
then gs changes and in accordance with (2.5) the lengths of 
the segments on which the minimum has to be sought decrease, with 
some segments vanishing. This process is continued until all the 
segments have disappeared from the variable set of segments in 
which the minimum is to be sought. Possiblysthis oceurs) right 


away after computing £(X»5 ) aif 


£(X5) _ Ro te Aipaa) 


If f was computed the last time at Xe» then we set 


ees 
Vv ~ min 


ie[2:k] og 


The procedure carries over to the general n-dimensional 
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case and becomes a recursive procedure analogous to N. The col- 
lections of segments on which the minimum is to be sought on each 
coordinate should be stored in the computer memory. Only for the 


Ae coordinate will these segments be stored, split and reduced 


from throughout the computation. peaks 7 coordinate and 
each new call of N(i,a,b) this system of segments is determined 
anew for the coordinates with indices 1,2,...,i. This modifica- 
tion is seen to be quite effective, in spite of the fact that in 
the most unfavorable case when f is constant, it leads to roughly 
the same result as the basic method does. The segment [a,b] will 
be covered by a uniform mesh, the number of calculations of f(x) 
will tbe close to K- 
Implementation of this modification does not require very 
large computer memory, since the number of stored segments does ‘ 


not exceed K,- By a recursive procedure analogous to N, the 


above-described covering carries over to a multidimensional case. 


7. PARALLEL COMPUTATIONS 


The operating efficiency of this method and its modifications 
depends on the sequence of coordinatewise coverings of the paral- 
lelepiped. Before the computations, it is hard to determine the 
best sequence. Multiprocessor computers are most helpful in such 
cases. We show how the procedure N is modified in this case. 
Let us introduce variable n-dimensional vectors ass DES R 
At the beginning of the computations we put a, = 4, by = b. 


Suppose that according to the basic method, the computations are 


carried on with the procedure N(i,a,b) which we write here as 
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N(i,a,b,w,), where the vector Wy ee designates the sequence in 
which the covering occurs. In the basic method wove Tl s2, Siam i 


Atver  phes=iicsit al on N(n-1,a,b,w,), the value of the record 


Ri. is determined and the walue of Vie is computed. 

Hence in the further computations we need to find the minimum of 
f on the parallelepiped PL = {xeR?™: a, <x<b,I, where all the 
components Aa. and bi coincide with the corresponding compo- 
MSES veh Bloc! 1a» “ESCO ay = 3 fe 


Suppose that simultaneously with the first processor in which 
the procedure N(n,a,b,w,) was processed, a second processor 
operated in which the procedure N(n,a,b,wo) was used, with the 
sequence of covering coordinates given as Woes [23.354 R022 Hd. i 
Atter the stirs calda of N(n-1,a,b,wo), the record Re and the 
quantity ve are obtained. As a result of Operating both pro— 


cessors, further search can be continued on the parallelepiped 


? te 2p Se eee avttve sx) <bi, 


Dae te Bele eh 


n n n n 
a +V psx Sb } 


For the simultaneous operation of two processors an exchange of 


the refined records and reductions of the parallelepiped is 


needed. 
If the computer has no processors, then on the io pro- 
cessor one can call the procedure N(n,a,b,w,), where 


We Sei Leen se aed eee ee iaevi 


The refined record and reductions of the parallelepiped are 


exchanged between all the processors. 
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Other variants are also possible for computations using multi- 
processor computers. For example, the parallelepiped P can be 


partitioned into s parts, s being the number of processors. 


8. NUMERICAL RESULTS z 


The problem of finding 
1 8 F hes 
ee = ae 3 & sin an{xd + | ; 
jedi } 
where | 0 < x wil pdorell oG,fprovided attest problems (Within 
the feasible set this function has 30 isolated local maxima and 
two global maxima, £> =$1; miwitheien=N0n0ly ef =v027 yagi f Zfior 
solving this problem one uses a complete search on a uniform mesh, 
it is necessary to evaluate the objective function Aatiors times. 
The computations using a nonuniform covering were carried out in # 
two stages: first the problem with 2% =0.2 was solved, and next 
with % = 0.7. The computations by the primary scheme needed 
3+107 evaluations of the function; better modifications can 
reduce it to a few thousands. Local methods could find the global 
solution with high accuracy much quicker. Nevertheless, the meth- 
od can find a guaranteed result also for more complex problems, 


but, of course, does not finish computing before the feasible set 


has been covered completely. 
3. SOLUTION OF NONLINEAR PROGRAMMING PROBLEMS 


1. THE STATEMENT OF THE PROBLEM 


The approach described in the preceding sections carries over to 
solving nonlinear programming problems. Suppose the global mini- 


mum is sought: 
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jer teg et jek GEOR eo 5 Core) 
xeP nx 
where 
KX =) {xe R°: h(x) < 0} , . 


P being an n-dimensional parallelepiped defined by (2.2) and 
iis Rie eRe 
Analogously to (1.2) we define the set of approximate global 


solutions of the problem (3.1): 


xX. Se a ce Ch ees ae C32) 


The set X can have a very general form; it can be nonconvex and 
non-simply connected. It is assumed that the functions f and 
h*(x) satisfy a Lipschitz condition on P with the same constant 


h, ai.erw, Toriany oxpeyeiPieone hast (i. Sdrand 
in (yl Sie 


In this case (3.1) has a solution. As before, we denote by Xy 


ubewser Oteali =e lobal wsolutions. introducer thestunemion 


W(x) = max hie 
Le | dusre | 


Let-or ‘denote the *setyof boundary pointsvaf yx. tUsing the 
Lume elon Ws Fonercan write -f = {xeR"; v(x) = O}.~ 


For each =e Ir we define the ball 


Be = { xeR™;: |x-x,| < a \ 


The union of all such balls when ss takes on all possible values 
from [ will be denoted by Y. This set is thus some open cover 
Onbenic icmrs cman, 

Using Theorem 1.5.2, it is easy to show that w(x) satisfies 


a Lipschitz condition with constant 2, ite Gay 
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Goo SW GP es a|x-y| (353) 


Hence for each xe B(x 5) we have the estimate |w(x)| < 5 and 


for each point xe«YnX the inequalities -5 < w(x) < O are 


satisfied. g 
Let Z=P(X\Y). We replace problem (3.1) by the following 
problem: 


f » =, cminet (xs. (3.4) 
xeZ 


In somewhat nonstandard fashion, we define the set of approxi- 
mate solutions of this problem: 


= . z € 
Las a€ {x <Z: [ftx) =. 2] <5} ‘ (3.5) 


Suppose that Z # Z. Then we have the following lemma. 


LEMMA 1. Let the set X have interior points and let the closure 


, 


of the interior coincide with xX. Then Zo Xs 


Proof. het Kd Zo: We show that 


OFS FL Ee at Paste Ss (3.6) 


1? * 


We now compare (3.1) and (3.4). From the obvious inclusion 


ZePNX it follows that i, <1. Hence for every x,¢€Z Cand a 


it 


posteriori for a point in Z_) thes lett anequality ine C3.6 Ls 


* 


Ssatistied. From the definition (3.5) it follows that 


f(x) -f< Coa) 


= 
2 
Consider the case where there. is at..least.one point. x,¢X%, such 


Cia tex fC VY OR se Eben) oii = f, .and,from) (3.7), one has 


S 
EM aed I Oi ’ 


which is stronger than (3.6). 
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We consider next a second case: X,c Y. The assumption Gi bme 
Lemma that the closure of the interior of X coincides with X 
excludes sets with "tentacles." Hence for any point x,«X, we 

: : € ; apd 
can find a point x,¢Z sueh that Ix. - x,|| < sy: Using a Lip 


schitz condition, we obtain 


Gin) = ee Cae (3.8) 
Since Ko € Ly Li TOlLlows vast. ot .s £(x5). Hence from (3.8) we find 
co € 
ie oi 


Combining this inequality with (3.7), we arrive at the right side 
On ESriO)). 
ComparingeG3 2) swath ethesdeniniution .(Se>) of ethe seb Xo» we 


reach the conclusion that X1¢ Xo3 and by the arbitrariness of x 


i 
we arrive at the desired in@ateson Z c x - The Lemma enables us 
to replace the problem of finding the global minimum of (3.1) by 

an approximate solution of the problem (3.4). The error of deter- 
mining the minimal value of f is ce, and if the global minimum in 
(3.1) is attained at least at one point of X\Y, the error does not 
exceed ee The set Y was deliberately included in Za» in spite 
of the fact that all points in Y are not feasible in the problem 
(324). Since Z\Y =e We did it because in solving (3.4) we may 
obtain points not belonging to Z but belonging to X instead. 
For example, this can happen in using auxiliary procedures for find- 
ing a local minimum. It seems natural to take into account the 


best points obtained in determining the record. 
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2. A METHOD FOR SOLVING PROBLEMS 


Suppose the values of f(x) in the sequence of points {x, } be- 
longing to the parallelepiped P are calculated. We use (1.3), 
taking the minimum just oter tthe values £(x,) at points X;¢ ee 
Thus, as before, the RL can be called a record since these are 


the minimal values of f(x) at feasible points of the {x,}. 


All points satisfying the inequality 


R, - . < f(x) (3.9) 


are of no interest in solving (3.4) since the minimum of f(x) on 
this set cannot improve the value of the record by more than os 
eh ae X and HO) 2 Ry» the condition (3.9) is satisfied at 


all points which belong to the ball centered at x. and are en- 


closed by the sphere of radius 


€ 
afro) mee Et 
Vix = —_i__4_# : CBr OD 


If the union of such balls covers the set (X\Y)nP, the pro- 
blem (3.4) will be solved. We can do this by covering the entire 
parallelepiped P, but we still need to derive formulas for co- 
vering unfeasible points. 

Let %, ¢ X. We determine a neighborhood of x. which can 
be omitted from consideration. Here the following is of interest: 
Cone canons thes pountsmrore which (379) misesatacst ved: 

e2. one can omit the HOlMeS TOW Winskea Wes) =O Sse iwlaeyy 
are unfeasible in (3.4); 


e3. one can omit the points which lie inside the sphere 


centered at a with radius less than the shortest distance from 
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x. to X\Y since these points are also unfeasible in (3.4). 
USinge Lipschitz conditions’ (1.8) and (3.3), we ‘can Show that 
conditions 1 and 2 are satisfied, respectively, by the points 


lying inside the balls: e 


wo 
Il 


IA 


n f(x,)-R. +5 
ve Ke R : |] x-x, || ade salar aah lnes , 


W(x.) 
9 {<a Hao = 


Let us show now that condition 3 is satisfied by points in 


ww 
Il 


IA 





€ 
Wx.) ts 
n 2 
2 js ea : |] x-x5 || < See 
Obviously, Bo E By. w(x) 2 O everywhere on Bo and vanishes 
only on the boundary. Hence the interior of B contains no 


2 


points of X, and boundary points of X can lie on the boundary 


Outs Bo. Hence the point Xy irom “AVY “that 1s closest: to mS 
is either inside the layer H = By \ Bo or on the boundary of Bo, 


or outside Ba. 
We show that there is no point of X\Y inside H. Assume 


that such an Xx, exists. Then 


v(x.) f 
eee pile ota bees ote re G3411) 


Since xX, ¢« X and x, #Y, it follows that inside the sphere 


By = {xs llap--l) = =e } 


there are no points which do not belong to Ay, in particular 


there are no points of By. Points of X and possibly of By 
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lie on By We construct the straight line through a and x 


*x° 
From the two points of intersection of the line with the sphere 


By we take the one closest to Xs which we denote x. Since 


x<¢B,, it follows that xeX. We use the representation 


lx; - all = le; - Il + WR- xyll 


Noting (3.11) and the fact that x ¢ B we obtain 


4? 
040%) 
= al 
ap al Sas sl eaee, 
ie. Se int Bo; which contradicts the fact that x e X. Hence 


LG) LOMMO WS Mee Ela ae Eeeikere One By MENS a) XO ALIN ES. aeagoyny OM \ WA 


From the formulas obtained for the balls By, Bo and Bg 
we find the formula for the largest ball. Let , 
€ 
aa ae 
n 2 k 
Bien = fen: Ix-x,[t-« 25508] i (3.12) 


where 


di. = max [v(x5), f(x) - Ry] 


If (3.12) is used for feasible points with — e X, we then 


Obtamnmthat UC) CO have 


ae = if) - R 


ie £(x,) =e p 


ines k 


the radius of the sphere enclosing eeu COLNnel Cade se wis he CoO) ie 
Thus, (3.12) is a general formula for determining balls with 
feasible or unfeasible centers, which can be omitted. We arrive 
at the following lemma. 

LEMMA 2. Let the sequence of points  {x,}.’in P be such that 


k 


the union of the balls aah (j <¢[1:k]) completely covers the 
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parallelepiped P. Then every point aa X such that aCe) = Ry 
belongs to the set of approximate solutions x. of the problem 
G3e ge) 

This lemma enables oneeto solve the problem (3.1) by con- 
structing a covering of P. For this one can use the results of 
the preceding section. The parallelepiped will be covered by 
balls enclosed by spheres of distinct radii. The smallest radii 
Wali (xe ieee ieee joorianS ais whoal@€a (Cx) wikes Om welliios Close ie 
the current record R.. Far away from the boundary T (more pre- 
cisely, where w(x) >> 0) and at points where f(x) >> Re the 
radii of the spheres will be larger. 

To reduce the calculations it is essential to know as accur- 


ately aS possible the value of the current record. Hence at the 


Kana X at which is satisfied (1.7) one should go to the local 


methods of solving (3.1), sharpen the value of R, and enlarge 
the balls eat (j =1,...,i) if they are saved during the 
computation. 


3. TAKING INTO ACCOUNT EQUALITY CONSTRAINTS 


itemay happen thatl X\Y = QO This occurs tor example. ies anome 
the constraints on the feasible set X there are equality-type 
constraints. Such constraints give sets with empty interior. 

Consider first a particular case’ “Let W(x) > 0 “everywhere 
OP with vnewexceptlon ot a tinitemnumber Ot mpomnusmwalt la 


w(x) = 0. Instead of this problem we consider the problem of 


finding the global minimum on the intersection of P and 
X =ral xs thx) <a bea 


where’ 6 > 0 Specifies the accuracy. 
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On the one hand, it is desirable to take the value 6 char- 
actizing the accuracy of fulfilling the constraints as small as 
possible, and on the other hand, in order to use our approach, 6 
should be such that Xe \ X. F @. Here es denotes  thenun ton ,ot 

, 


allethes balls 


eee eee sa 


as xX; Yranges over all possible values of the boundary of Xo. 
Lixo) Satis tbesethe Li psichitz. condition eC3.3)) 05 ven 
Rael te ca il SHO A ea ee 


In the general case of seeking a global minimum under equal- 


ity constraints, one can use an analogous technique for passing to 


inequality constraints. The feasible set X given by (1.6.2) 


can then be replaced by the set 


Rea eh lex) he) 60} 


Let us solve the problem (3.1) in which Xs is taken as X, using 
this method. The approximate solution will satisfy the inequality 
constraints exactly; the equality constraints g(x) = 0 will be 


satisfied approximately with error not larger than Op 


4, CONCLUSIONS 


It is worthwhile to compare problem (2.1) of finding the minimum 

of f(x) on P with the problem (3.1) under the additional 
constraints xe«xX. At first glance, it seems paradoxical, al- 
though it is true, that finding the global solution in C3) eS 
simpler than in (2.1). The constraint xeX provides an addition- 


al possibility to increase the radii of the covering balls on 


(500) 7, SEARCH FOR GLOBAL SOLUTIONS 


P\X. Hence, the additional constraints merely simplify the prob- 
lem of finding global solutions. The auxiliary procedures of loc- 
al search in the problem (3.1) are not much more complex than in 
the problem ¢2.1), if we arg concerned with*the problem of finding 
the global minimum. In that case, the employment of the penalty 
function method becomes absolutely irrational for finding the 
global solution of (3.1). We exclude a rather unusual case where 
the upper estimate of the Lagrange multipliers is known sufficient- 
ly precisely at the global minimum point (see Section 3.2) and 

the problem reduces to a one-step minimization of the exact penal- 
ty function. In the general case where the penalty function meth- 
od is used, passing from (3.1) to the multiple solution of the 
problem (2.1) substantially complicates the computation. 

Thus, the penalty function method, though a very effective 
tool for finding local solutions, is not advantageous for finding 
global solutions. The same can be said about other local methods 
which require multiple minimization in x of auxiliary functions 
(the cost-function parametrization method, the method of modified 
Lagrangians, among others). We point out once more that all these 
methods play an important but only auxiliary role in finding glob- 
al solutions. 

Let us consider another essential property of this method. 

To reduce the number of computations, the exploitation of neces-— 
sary conditions for a minimum as additional constraints gives a 
basis for expecting good results. We illustrate our statement, 
using a simple version of problem (2.1) as an example. Suppose 


that f(x) is differentiable on P and attains a global minimum 
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at an interior point of P. We introduce the necessary condition 
for a minimum f(x) = 0 as an additional constraint in relaxed 
form If Gx) I —(lnethicwcase,mainctead ofl. 2) sonescan suse 


(3.12), where 


a 


die = max [llf.¢x, | - 6, f(x;)- Ry] 


This makes it possible to enlarge the volume of the balls Bok: 
If one drops the requirement that the global minimum be attained 
inside P, then the necessary conditions of the minimum are 


written in the form 


ear bi te CD ne oes Seah 
Se ae 


We determine the feasible set in (3.1) as follows: 
= al T > 
Ree ee xe RS || £,.(<)D(x-a)D(b-a) || SUS TUS 


Thissideasicarrres Over not only tO (o.l)) Dut also co, Other, more 
complex problems in which some properties of the solutions are 


known. 


We shall dwell briefly on yet another approach to solving 


(3.1). Let the feasible set be defined by the scalar function 


nD oan ae 


X= ofeep ro (x) OF , 
where the function 9% satisfies on P a Lipschitz condition 


lo(x) - o(z)| < ix - zl 


In contrast to (3.3), the set of approximate solutions is: 


— =uix ~@P: $¢€x) S' ¢} 
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ToVeach® pomnt xj ¢ P we associate the ball 


Bik = {eeR ¢ | x = x 5 < r5y) ; 


here the radius is given by 
e 


pax = ¢ max[e + i) - Ris one) Oily es 
rwileotc © mmen O eau —<aere 

THEOREM. Let the set X #(Q% and let the sequence of points {x} 
Ot Dm OCMC lime marty 


k 


ey Bok 


J34. 


Then it is guaranteed that each record point Xj & Xo 

The proof follows the same arguments above. As the function 
o(x) one may take, for instance, the H§lder norm of the vector 
F(x) which was defined in Section 3.2. In this case the function 
¢(x) satisfies a Lipschitz condition if all components of the 


vector functions g(x), h(x) satisfy a Lipschitz condition. 


4. SOLUTION OF SYSTEMS OF ALGEBRAIC EQUATIONS 


1, GENERAL SCHEME OF COMPUTATIONS 


On the n-dimensional parallelepiped P let-4 mappintc Fs Pp = po 
m<n, be defined and P be given by (2.2). The problem consists 


in finding solutions of the system 
Os) = Sea D wih (4.1) 


Approximate solutions of the problem will be given by the points 
of the set xX. = (x a i Gey ll tea PER Toseolvelthe problem, 


1 SEE OCNS Ile) wide! eae WSASt Ome point a in xX. We assume 
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that further computations for sharpening the solution of (4.1) 
will involve local methods (see Section 2.5). When Xx. is empty, 
the algorithm should guarantee that the assertion concerning the 
absence of approximate solutions be true. 

We suppose that the mapping F(x) satisfies a Lipschitz con- 


GEE oOIMn Cin, 12 ayslicly xeolASmeltis Shi choo pile Bly” os Apleee de toanes Navel 


POO = Fiy il (a) tlxes ydl, (4.2) 
The problem (4.1) is equivalent to the minimization of the norm 


Orie) (CX) On Ps 


Toe ern Cx) RY (4.3) 
xeP 
where f(x) = ||F(x)||. If (4.1) has a solution, then f, = 0; 


otherwise, f,° > 0. To solve (4.3) one can use the results of 


# 
the preceding sections. For the sequence of points {x,} in P 


we use (1.3) and (1.4) to determine the record and the set Zy 
Tet R. < e, ‘then (4.1) is solved; otherwise, the set Z. can be 
excluded from the search. If x, is an interior point of Zh 
then along with x. one can omit from consideration some ball 
Bix centered at ie Hota lb wepolntslOnethis, Dalits bicsCcond at, vom: 
(1.10) has to be satisfied. We use the known inequality 


Peele vs [la be 
Then (4.2) yields 
| £(x5) - f(x)| Ae [| FOx;) = F(x) {| = ux, - x| 


Hence (1.10) a priori holds if (1.11) holds, whence we conclude 


that the ball Bk and the sphere enclosing it are determined 
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TCO al wala Ande bhiew na dls sOmmbhemsphicaemmromiGl. Us) hurt Mierncon— 
structions are made following the schemes described above. The 
covering of P yields either at least one point from x or 
guarantees that xX. is empty; also we find a point x, furnish- 
ing the norm f with the smallest values RL on P (with error 


NoOmehace cas aphianwere )ie 


2. A SIMPLIFIED VERSION 


When it is known that x. is not empty, the computations can be 
madew Lea bly ws implC ws Lake Gp G5) Fea? OQ Let) f(x) e=4) Fix), 


be calculated at the points {x,}. If the R, determined from 


(1.3) is less than e¢€, then the problem is solved, otherwise we 
omit from consideration the points x which satisfy Ey <i (xe) 
Using (4.2), we obtain that this condition is satisfied by the 
points lying inside the sphere with centers at points - and 


radii 
f{(x.)-€ 
Ce 


ae Fe ae ge ea ‘ (4.4) 


From the inequality Os a Cea Opel OwiSity nlelse mcrae Gumi emmrer clei re 


are not less than (e-€,)/%. Tet ca ae as E4> then the compari- 


SOnmoOtun (ele) manic (4.4) tino ake Smet inant - = which makes it 


ae 
possible in the given case to cover P by balls of larger radius. 


We complete the covering and obtain R, < €; the problem is 


SOLVCd a lin Gemini ta Leahy pothest sm ila xX. 7% ® does not hold,-we 


can see it for sure by the results; however, R. need not coin- 


cide: in-thds case with- the minimum of .||F(x)|| on, BP. 
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3. PARAMETRIZATION 


Solving the problem (2.1) of finding the global minimum on P can 


be reduced to finding solutions of the equation 
PCAC evn) ahaa 5” x <5, . (4.5) 


Where $(z) is a strictly monotonic increasing function of az, 
oC0) =! O19 Tt) thes anxiliary parametersmnie ayy? the solution ,of 
(405)eyieldstpoints, xne Xy51 diffe py,< fygui then; (4,5) has no real 
solutions. One can reduce the problem of finding the global mini- 
mum on P to finding the smallest value n for which (4.5) has a 
solution. This technique is analogous to the cost-function para- 
metrization method (see Section 3.3). However, the wisdom of such 
an approach in the present case is questionable. The point is 
that in finding the global solution to (2.1) it was required to 
perform only one covering of P, and we obtained an approximate 
solution. But for an appropriate choice of the parameter n it 
is necessary to make a covering of P at least for several values 
of n, which is much more difficult to do than immediately solv- 


ing the initial problem. Therefore, it is not sensible to reduce 


the problem (2.1) to finding a solution of the equation (4.5). 


5. SOLUTION OF MINIMAX PROBLEMS 


We examine here the problem Cie Seo)OLetindino ies min mac Os: 


F(x,y) that was studied in Chapter 1: 


V mn TISIOT MI] lone Ca) ee (Connie) 
xeX yeY 
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By (1.5.8) we define the maximum function $(x), the point- 
set mapping B(x) and the set X,. 

By WMieorreint 252 ale (Ey Se Ww) al) = ORM Seestisaraliess 
a liapschinz, condition in Z with constant 2%, the maximum 


function 6(x) = F(x, B(x)), where B(x) = Arg max F(x,y), 
yer 


defined by (1.58), also satisfies a Lipschitz condition with the 
same constant %. This property opens broad possibilities to use 
the method of finding global extrema for sequential minimax prob- 
lems. The same method can be used sequentially for solving inter- 
ior as well as exterior problems. The subprograms of local 

search usually are chosen differently, since the functions F(x,y) 
are often differentiable and their local maximization is carried 
out Usdns properties of Smoothness On) Ff Vine Woe The) funet ions i 
is only directionally differentiable, and it has to be locally 
minimized by other methods. 

Comparing the problem (5.1) with the problem of finding the 
global “extremum! of ~f ean!’ =z onl On wes can conclude: thate Gao) 
has an important advantage. Indeed, let the value of o(x,) be 
known at some point Xy- If for some other point X5 one needs 
to find the value of (Xo), the process of maximization of 96 
In xe scan be stopped as soon as at least one point Voe Yeu hias 


been found such that F(X5sVo) 2 (x4) Since in this case a priori 


$ (Xo) = d(x) and the value $ (Xo) does not improve the current 
approximate value of the minimax of o(x4). This property makes 
it possible in a number of cases to terminate the process of solv- 


ing the interior problem. 
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We use a very Simple scalar case where 


xX FS fees a 


lA 
* 
A 


<b) 5 


{y PR o 


Ke 
ll 


lA 


y < d} 
iP: 


For simplicity, we consider the case where both the interior and 
exterior problems are solved by the basic method described in Sec- 


tion 6.2. The accuracy of solving the interior problem (maximiza- 
tLlon an Py) ters "siven py ey and that of the exterior problem 
(minimization in x) is given by Eo: 

In the process of solving the exterior problem, suppose the 
approximate values $(x,), O(Ko) seers $(Xg) are determined, for 


which we have the estimate 


(x; ) aie $ (x, ) s o4 . # 


We determine the record Re and the approximate record Ro: 


R, = min [6(x1), ---, $Cx)] 
R, = min [6(x,), ..-, $(%,)] 
: ; : = 2 th 
This implies that Re - Re Ss Ey: Passing from the s ioe lets 
gee" point is made by the formulas (C22): 
2e,+ $(x,)-R 
x Anda 2 ee ee (5.2) 


st+1 Ss Q 


The solution of the exterior problem COME Teas Wing (Poy aks 


satisfied. 


In the process of the last solution of the interior problen, 
suppose the values ECKL is = £( sea yy) are calculated. We 


determine the record 
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Hi AAe) = Max ECR yy eo Een fe 


RESO Orel Ols my hiva te 


Fa Gon) ae Tels © we CXeee ie) een (Ce) 
oes ezy<d s Ss 
Def R. < Hy. (x,), then further maximization can be stopped 


and one can proceed to the new point x using (5.2), taking 


st+1 
there the value Hy (x) as $(x,). Another possibility is to 
continue the maximization process in y. Then we set 

i 2e, +H, (x,) a tC Se 


Mf ay 
k+1 k 2 


The process can be continued till Vy < oe If the last such 
point was Yq we change x by (5.2), letting $(x.) = eae 
Sts joe > HL. (x5), these additional computations are justified 
Since the set of variation of x by (5.2) has grown by the quan- 
Ean (xem eee (exe) 
tity —4_S__&s" ; The applicability of any version of the 
method depends in general on the dimension of the vectors x and 
y 92s, Well as the behavior of f(x,y). 
Theoretically, this approach makes it possible to solve 
sequential minimax problems and opens the door to solving the pro- 
blems of discrete approximation of differential games. The ex- 


ploitation of local methods can, as usual, enhance the effective- 


ness of the basic method. 
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6. SOLUTION OF MULTICRITERIA PROBLEMS 


ioe Lae STAREMENT OF THE.PROBLEM 


Practical construction of complex multipurpose program packages 
leads one to solving so-called multicriteria problems. Let x eR" 
be the vector of parameters of the product to be manufactured and 
the set X be given to which the vectors x have to belong. The 
product quality is estimated by the set of characteristics Rex), 
Wegasm|. We denote this set by F(x): = Ces Cone FoCz) 

The designer prefers to choose a feasible point x « X such 
that all the components of the vector F(x) take on the smallest 
possible values. However, this condition is usually unfeasible: 
when one component is minimized the other components increase. 
Hence the term "solution of the multicriteria optimization problem" 
requires special clarification. By 

min F(x) (6.1) 
xeX 
we denote the problem of multicriteria minimization Or IGE) on 
X. By solving this problem we mean TINGS? OXON seieeiin wlavsy Ke 
called Pareto set X,. We say that the point Xy belongs to the 
Pareto set xX, if x € X» there are no other points x <« X such 
that F(x) < F*(x,) for alk 1 © |demimeand form vat least. one 
4 < [ism), the. strict inequality F(x) < FI(x,) is satisfied. 
The set of all points hevine this property is referred to as the 
Pareto set and is denoted by X,. 


Next we introduce the images of the sets X and X, under 


the mapping F(x): 
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Yo er Ga See 


the set Y, is the Pareto set for the following very Simpie 


multicriteria problem: 


mien you. (Gs) 
yey 


We say that xX is the Pareto set in the space of construc- 


* 


tive parameters and its image Y, is the Pareto set in the space 


of criteria. 


If at happens that for two spoints =x X, € X the inequal- 


soe 
ities Tae F(x,) = Yo = F(X5), V4 # Yo are satisfied, we say 
that the point V4 is more efficient than Yo, or that Yo is 


less efficient than Y4° 


We assume that each component of F satisfies a Lipschitz 


condition with the same constant 2%, i.e., for any X4 and Xo 
we have 
i zl : 
| F (aie) sur (x5 )| < &||x, - x5 || ; i ¢ [1:mj 
yielding the vector inequality 
F(x,) - ef||x, - x5|| < F(X5) ; CGR 3)) 


where ee Re is the vector of ally Y's. 


2, THE CONSTRUCTION OF THE NET 


the strueture of the Pareto set even for the Simplest problems 
turns out, as a rule, to be very complex. This set is frequently 
nonconvex and non-Simply connected. Hence it is hard to approxi- 


mate, We will attempt to construct a finite set A. reminiscent 
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of the usual notion of an,-€-net,of. the. set ~Y Take a set of 


points A, /= LVgareae Vy), y; 


eo F(x, ), Rie oh, 1OepaAdle tee lek, 


at 
We assume that along with A. the corresponding set of points Xi 
from the feasible set X is.stored in the computer memory, or is 


eaSily calculated. ‘ 


Besides feasibility, we impose two other conditions on the 


set of points AL: 


oi, for any ¥ pee 1¥,\ sthere exists a vector vabis A, such 


that 
ye Sey ee COS, (6.4) 


Oe, aioee eheay y; fe A, there does not exist a vector Vera A. 


such that v5 < Yj. 


We say that the set of points A. satisfying these condi- 
- 
tions is an e-net of the Pareto set, and refer to the conditions 
as the first and second net conditions, respectively. 


For eee Y we define the set 


m 
MADE fyeR : y, yteelh , 
which contains all the points less efficient than Nay ere cs 
Mei Sees AL: Then the set M; LINO L Nome cesim lO ls 
from the viewpoint of constructing the e-net. Indeed, each yy, 


in Y, that belongs to M; satisfies the first net condition 


(6.4). Hence the set M; can be omitted from consideration. 
We take the union of the sets M; OV Ea Ot A, and 
k 
denote it by 4. = aa M; A This set can be expressed in terms of 
1= 
Z. = {y «RP: max min oe eet > of 


eI she |Pabeiny 
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The set A. varies in the computational process. If we 


could ainda point. y= Vo j}such\ that y= Vas where tae AL» we 


take ae Our @al JN and replace it by y. Several points can be 


k 


removed simultaneously. Thanks to this, the second net condition 
of the Pareto set holds automatically.- On the other hanes act alt 


happens that y does not belong to Zs then it is included in 


A which is now written A If, as a result of the con- 


Ke 


struction of the set A. it happens that 


isis 


Tee 2a (6.5) 


then A. forms the e-net of the Pareto set. Indeed, for each 
point y, € A) Yi we can tint a point es A, Such chatencGe4s) 
holds. The problem of constructing the e-net of the Pareto set 
has thus been reduced to constructing the set of points AL Ssa- 
ip Sie yim Ono) eA SmEiner the preceding sections, to do this we use 
the corollary (6.3) of the Lipschitz condition. 

At xt <x» let the value VY. = EC) be calculated, and 


suppose it turned out that Vy € M,. Meo (S,8)) ay stolilhowe eat 
Fee) Mreh||xi xe =. FG) 
ie Sars Olena: 
F(x, ) -€e < EX) - ek [lx xe] , 
then y= RG) «= M;. Henceena Miss odnib sine satisfying 


ef|[x,-x|| < BCA, SEFC x, at ce (6.6) 


can be omitted in covering the set xX. This set contains a ball 


with center at Ka 
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= ia ' 

ils {x en || eex | SS €o" min [FS (x4) ~ P8Cx, 1 ; 
se[1:m] 

Tet Vero ies then the radius is smallest, equal to 7: In the 


case when A. contains several points more efficient than y,, 


we introduce the index set 


IC yi) o = ocdive LS Wil xi aes yal yee A) 
This set contains the whole collection of indices of vectors in 
A. which are more efficient than y,. If I(¢y,) is nonempty, 
then waiter determining “y, = F(x,) one can omit all the points 
x for which (6.6) holds for at least one ie I(y,). Hence ith is 
optimal to choose an i such that the corresponding ball B; has 


the largest radius. This radius is computed by the formula 


- 


— + max min (FS (x,) - FP(x, )) 2 CB net) 
121 (y¥,) se[1:m] 


©) 
{I 
s|H 


Construction of the e-net of the Pareto set has thus been 
reduced to covering the set X by balls of the jeopam (Cor (hs — AKS) 
implement this process one can employ the techniques described in 
Sections: 7a2.. ande.%os., Ef Xx isebounded,| thenyats covering is 
accomplished in a finite number of steps, and the e-net will also 
be finite. Here, and in all the cases considered earlier in this 
chapter, in order to speed up the computations it is useful to 
use the methods of local search. Such methods for determining the 
points in the Pareto set are now being successfully developed. 

ThHemsieis A thus found is provided to the customer who, 


according to his own considerations, chooses the most preferable 


set of design parameters. “It Ay. Furne Out to be too large, it 
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can be reduced by discarding close points. The distance between 
points can be defined in the space of criteria and in the’ space 
of design parameters. The user gives a number N determining 
the smallest distance between the points, and a special program 
MSL uss! wlasxonutela weer Ser AL» leaving only those points which are 
separated by the distance greater than N. 


The basic result of this section is embodied in the follow- 


ing assertion: approximate solution of the multicriteria problem 
(6.1), in terms of computational labor, is equivalent to the pro- 


blem of finding the global minimum (1.1). There seh, (One Colcdse- 


some complication connected with the fact that it is necessary to 
calculate m values of F(x) instead of calculating the value 


of f(x), as well as to store the set of points A However, the 


kK" 
basic computations with respect to the covering of X are roughly 


the same. 


In the literature on multicriteria OplLamaizarvon, the follow— 
ing three ideas for solving the problem are most popular. 
*1. Forming the convolution of criteria: For example, under 


the assumption that all criteria are nonnegative, the convex hull 


is: 


PGX 500) eee 


IV 
o 


Pe hae art 
) Ch IN (Oy, a 
ae 


ai 
Then for a fixed set oat one solves the problem 


IIRL 2s (Gece) = PCs, CO), 9) (6.8) 
xe 


and then has a, (ove x, For another vector a a new problem of 


seeking the minimum of P(x,a) is solved, etc. In the case when 
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the required number of points from the Pareto set is greater than 
one, this approach is not efficient since its implementation 
requires a multiple Seakck for the global minimum. 


e2—) Ones can pose tie = iter of nonlinear Sd Saas net of 


finding min ee (x), where the feasible set W = {xeX: : (x) 2 ae 
xeW 
ive (2:mi)} is specified by specifying the vector a. Again, to 


obtain only one point from xX, one has to solve a global minimum 


problem. 


3. The parametrized function is: 


m ‘ F 
Pine ey olan a> ea 


i=1 


and its minimum is sought on the set X for equivalent sets of , 


a. This approach also requires a correctly specified vector a 


and leads to a large volume of calculations. 


All these approaches can be useful if they are employed as 


local procedures for determining the initial approximate e-net 


of the Pareto set. Then the succeeding solutions of the auxiliary 


problems do not require a large volume of calculations. 

This concludes the description of a new trend in global opti- 
mization. The results of this chapter should convincingly demon- 
strate the universality as well as the potential of the method of 


nonuniform coverings. 
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Appendix | 


DIFFERENTIABILITY 


1. DIFFERENTIABLE FUNCTIONS 


Let the real function f(x) of the n-dimensional vector x « Rn 


be defined on an open set X. We say that f(x) is differentiable 
at xe X if there exists a bounded vector _p-;e R” such that for 


allo eh <ok satisfying the condition x+h <« X we have 
Morn =) Ek a+ op, be Fiqh pas hyth (AT, 1) 


where the function a has the property 


aT Ol Xe) =) 
[In| PO 
Suppose that f is differentiable at x. Then f is contin- 
uous at x, | its partial derivatives with respect to all coordi- 


nates exist and the n-dimensional vector 


‘ Sm Se) BEX) 9f(x)|2 
x68) a il ) D Jcemelsh 5 n 
ox ox ox 





which ts called the gradient ar £ “at x, “is defined~” In "(A1.1) 
one can put p= f(x), 

We say that f is differentiable on an open set X if it is 
differentiable at each x « X. 


By f being differentiable on the set X (perhaps closed) 
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we mean that f£ is differentiable on an open set containing xX, 


We say that £ is differentiable (everywhere) if it is dif— 


ferentiable at each x « Ro. 


A CUNT LON) fC) that ,§has continuous partial derivatives at 


a point x (or on an open set X) is said to be continuously dif- 
hee nitela Demat xX COnme DS) 
Let h denote an n-dimensional vector with norm 1. The 
limit 
a mp eee eat) (apie es 
t>+0 
if Tt exists as said “to be the “derivative of ct CX, eee ree meme 


Ont) 
rr 


fi) i is differentiable ati x, +it/ has derivatives in any 





direction h and is denoted 


directaonm ati. ox. 


THEOREM A.1. If f(x) is differentiable on an open convex set 
X c R” and X1»X5 € x, then there exists a number 0 < + < 1 


such that the following equalities hold: 


Bap) ~OE(X = tap tag x), hy) os (A1,2) 


1 
J (£,(%4 + 0(%)-%1)), Xy-x,) d8 . (A173) 


£( x5) - f(x) 
The formula (A1.2) is usually known as the Lagrange formula, 
and (A1.3) as the Newton-Leibniz formula. 
We say that a function f(x) defined on xX Satisfies a Lip-— 
schitz condition with constant 2 on xX if for any X1,X5 € Xx 


we have the inequality 


|£( xq) = £(x5)| < & |x, — x5 | ; (Adi. 4) 
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It is obvious that ne function f satisfying a Lipschitz condi- 
TVONNON eS COMA NUIOUSH ODM Xt. 

THEOREM A.2. Let f(x) be differentiable on an open convex set 
X, where its gradient f(x) satisfies a Lipschitz condition 


With constant 2. Then for any X4)X_ € GRE) Ta Clam arty) eee 


we have 
a 2 
ECR Pte eS PC tie (as Xo -%1) + 2 ||x 9-*1 Il 3 
f(xy E(x = Ree CE), ghey ) tml Rye CA1s5) 


where the function n(z) is such that n(0) = 0 and 


Incz)| < 2llzll?. 


2. TWICE DIFFERENTIABLE FUNCTIONS 


Let f(x) be defined on an open set X ¢ R". We Say sthakh «et Go) 
TcmeL Wa CeOGis herent table tate cne exo cf clom ane hate R” satisfying 


x+h € X we have 


f(xth)-f(x) = f2(x)h + ah™ 2 (xh + |[bll? 8¢x,h) 


where the function —§ has the property 


alist os Cats) ee LO) 


|Ih]| +0 


Here LCD Loci a nt ane < sor Secondidemivatiaves or oh at 


x (the Hessian) with the Goose element 


9? f(x) 
9x 9xd 
THEOREM A.3 (Taylor's Formula). If f is twice differentiable 


on an open convex set X ¢ R" and X4,Xp € X, then there exists 
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a number O < t < 1 such that we have the equality 
£( Xo) - f(x4) 


ah i 
= £ (84) (%5-X4) + 2(X_-X4 ) als 


xx 6% + T(Xp-%1)) (%Q-Xy) 


3. DIFFERENTIABILITY OF MAPPINGS 


Let) a mapping o5 K > R” be given, where X is a set in ie We 


say that the mapping g is differentiable at an interior point 
ms (Oat OG abst GinlenceY Grorsics ima sein imeigiedee IN Sul@lay wlekesiy ak@se el i 


h <'R satisfying the condition x+h © X we have 


oD 
sexe te() =e Woh Pelle Cann) 77, 
where the vector function ©® has the property 


lim |]6(x,h) || = 0 


IIp|| +0 


The matrix A ‘is called the derivative of the mapping g(x) 
at x and denoted B(x). 

The diiferentiability défined in thiswway is ‘orten called a 
Fréchet GaSe ine Ieee i lncy sill tye eto ele afacloraails alga INNO ee Sub cliente 
mapping g: X > R” is differentiable Cane oe ame Let ce nN R 


then the condition 


mee Tay leet) - 8G) - exG@on|| = 0 
MSP Savlsa eds 

The Newton-Leibniz formula (A1.3) extends to the case of 
mappings, while (A1.2) does not. Hence the following inequality is 


what we call the Lagrange formula for mappings: 
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Ney se) i) ss sup ie Cat atyex)) || ily=xl| . (A1.6) 


OstTs1 


THEOREM A.4. Let the mapping g: X > R" be differentiable on an 
open convex set kX. Then for any x,y e« X the inequality (A1.6) 
holds. 


Leer Omani X4)Xo © R” we have 
lle(x4) - B(%_) || s R\|x, - Xl , 
g(x) is called a Lipschitz mapping with constant &. 


4, DIFFERENTIABILITY OF COMPOSITE FUNCTIONS 


DHROREM PATO. Leu the  tunct Jon) heres R™ be defined on an 
Qe Say 8 - R” and let the function Hm DeRCeL ned on Re 
Then, if g.is differentiable at xe X and @ is differentias 


ble at y= e(x), the composite function (x). = o6(e@))~ is 


differentiable at x and 


dy(x) . = a 
a E,(%) $,(8(x)) 
This result carries over to the case where y(x) = $(X, g(xX)), 


where 9 is defined on the Cartesian product Ro xk. The for— 
mula for calculating the derivative of the composite function 


WAG) Ineysy iclavsy aeyail 


YO) = 6. (x,8()) + By (X) $68,800) 


Here De is the partial derivative of $ with respect to the 
explicit vector x and oF is the partial derivative of 6(x,g) 


with respect to g. 


Appendix Il 


SOME PROPERTIES OF MATRICES 


1. 


By a matrix A we mean a rectangular table of numbers 


cde 1 oe ee eh 
P21 29978 “aekegeon 
A = 
Sie 8 Ono PRE Ol as Yee 
It m=n, the matrix is called square, the. number m =n 
is called its order. In the general case when m and n are 


not equal, the matrix is called a rectangular matrix of dimension 


m xn. The numbers forming the matrix are called its elements 


(components). At the intersection of the ie row and the a 


column there is the element Lng which is called the Ci,934 


element of A. The elements a. 


ii? i se (isn) torethe principal 


diagonal of a square matrix A. 

Let A and B have elements ar and Daae respectively, 
wherésad yeu [domi 0 j7etyl1:n] 4 Then, by the product of, A by B 
we mean the matrix C whose (i,q) element is 


oe ou) 


iq cE aoe a 


dl 
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Let D(z) denote the diagonal matrix, all off-diagonal ele- 


ments of which are zero and the son diagonal element is zr 


(the A COMponent Of ethervector ea 2) selne orden Ot DiGzZ yy Ls 


determined by the dimension.of the vector ga. 


If C= AD(z), B= D(z)A, then a ae Wee eas oe 
1J dd) Es) 


Thus, when an mxXn _ rectangular matrix A is multiplied on the 


Coe 
ij 


Pallas lon eh (hievevoilenll iiurely< IDi(ys)) oak Yepcleio ial, Bhi, We@ilabiviveys yee IN 
ale n 


are multiplied by the numbers z ,z ,...,2 . When the matrix A 
is multiplied on the left by a diagonal matrix D(z), all rows 
of A are multiplied by the numbers ee ee 


An mX n matrix, all elements of which are zero is called 
the zero matrix and denoted On A square matrix of order n, 
with 1 on the main diagonal and zero elsewhere, is called an 


identity matrix and denoted I 


Ay square matrix AY of onder n) is called ysymmetric Lt 


swe BIE ak gp = ial eral] 5° ASS@in. Gy teers Conluveneles walla 


its transpose: A= ee 


Bk aa 
Let abenaasduacc Mathicx Of corde Hoe wile Imai rl xeeeAe 1s 


called degenerate (or singular) if its determinant is equal to 


Zerou.s |Alg='.O.,. The roots of the equation 


|A - AI = 0 CAZAD) 


are called the characteristic values or eigenvalues of A. The 
equation (A2.1) is called the characteristic equation of the ma- 
trix A. The determinant and the eigenvalues of A are contin- 


uous functions of the elements of A. If the equation 


AX =  )xX GAZ 2 ) 
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3 n A : 
has a nonzero Solution x <b, x) is called ansevcenvectorVot 


ae a} 

A corresponding to the eigenvalue i. The matrices A and A 
have identical eigenvalues. 

If A is a real symmeéric matrix, then all its eigenvalues 


are real and we have the inequalities 


r x x < x Ax < xx aren 
where Ay < o See are the eigenvalues of A. If Ay Cae 
A is positive definite (A>0O); if Ay = 03 A is said’ to ‘be 
positive semidefinite or nonnegative definite (A = 0). 
LEMMA (R. Finsler). If x Ax > 0 for all nonzero x satisfying 
the condition x! Bx = 0, where B_ is a nonnegative definite square 
matrix, there exists a number wt, such thay the*quadratic siorm 
x ax + tx Bx is positive definite for 4 hd 

The matrix B is called a square root of A if A = BB. 
Kach positive definite matrix has such a positive definite square 
root. If the square matrix A is positive and B is nonsingular, 
then BAB is” positive definite. “in particulary) ior any non— 
singular matrix B  ©the matrix BB* “is positive definite, 


If A and B are square matrices of the same order, the 


following formulas hold true: 








laph = Tal tal CAB = pla’ 
The matrix A - is called the inverse of A if Aa- = Ie 
Any nonsingular matrix has an inverse. If A and B are nonsin- 
gular matrices of the, same order, then 
-1 -1 =e T= - -1,- 
A ee Al CAE) MES Calas CGR) tee! 
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2. 





A mapping ||°||: Ro + Rt Secale deag nO ci i te LOmean yeux ne R”). the 
following three conditions are satisfied: 
ei. O< |[x||, and ||x||=0 iff, x = 0; 
02. |lax|| = Ja] [Ix], o © R'; 
e3. |[xty|| < |lx||+{ly]] for any x,y © R". 
For the scalar product in BE” the Cauchy inequality is 
Satisfied: 
praca ane es halite ah seiNs 
HOE amveCtoOr norm ia in R” there exists a vector le 
called the dual or the conjugate norm to Usa oe which is defined 
by the condition 
- 
xiao = (sup (x,y) 
itl yaa 
if as hell 3 one takes the Holder norm 
i 
n é p 
iieclAoaea es Ulett pees Lie mage 
s jet 
then the dual norm is given by 
i 
n : q 
Ilxll, Ley , t+ . 


and we have: 


Bey Sil eval 


In particular, for the so-called sum norm 
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El y [xt | 
|| | 2 x 
es 
the dual is the norm 
i 
ie max |x" | 
ie[i,n] 


usually called the Chebyshev norm or maximum norm. 


and the dual norms coincide for p = q = 2: 


lIxll, = Ba 


Such a norm is called Euclidean or spherical. 


starting from the scalar product 
slg = 


For the norms of n-dimensional vectors thus 


following estimates: 























Wel = be We 
Wesel wees emer Tle 
| lle s Nee es Yn || 








ply ee alee ee 


Hence 





8 
lA 
IA 


lll, 
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Tie ta nartalal: 


It can be obtained 


defined, we have the 


oe 
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on 


Each norm of any matrix A of dimension mxn_ must satisfy the 


following three conditions: 











eet Oo aati and lal) = 0 if) a= 0... 
mn 
e2. laa] = [a|*|[all) oe Rt, 
e3. |{A+B|| < |/Al| + ||/B|| for any mxn matrix B. 


e 








We say that the matrix norm 








is associated with the 














given vector norm if ,fLoriany, matrix | AL of dimension mx n 


and any n-dimensional vector x the inequality 


> [[xl| 


]Ax|] < [lal 





is satisfied. 
We present the rule for constructing the smallest matrix norm 
sf 
associated with a given vector norm. For norm of A_ we take the 


maximum of the vector norms Ax as x runs through the set of 


all vectors with unit norm: 


[}all = max || Ax'|| 


[hd] =1 


Starting from this rule one can show that to the vector norms 














Mette Poste. ‘||, there correspond the following associated 
matrix norms: 
fe ee emer Te aMe la, 
i Felien] =1ee4 
5 
all, = max 3 lagl 


ie[1:m] j=l 


YX 


’ 


> 
oe 
| 
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where A is the maximum eigenvalue of Ata, ine pawiwale ula tage eel 
is an nXn symmetric matrix with eigenvalues Ayrrgreeeadys then 
All, => max” | AL iMe CAD 3) 
; i 
j ief1:n] 


By the spectral radius S(A) of the square matrix A we 
mean the maximum modulus of the eigenvalues of A. From (A2.2) 


and the properties of the norms we obtain 








P| Spay MP Sea ATR Fee || 
No norm of a matrix is less than its spectral radius. From (A2.3) 
it follows that S(A) = 1/4 I. At the same time, one can show 


that for any given ¢ > O there always exists a norm 

[All s such that -S8(A).+¢. 2 Alls. 

LEMMA (Neumann). Let the spectral radius of the square matrix 

A be strictly less than one. Then the matrix Lee is nonsin- 


gular and 


al rer 
(I -HA) sop Lamm, 4) igs 
n : 
seo 1=0 
Let the -matrices A, B and C of dimensions nxn, nx m cand 
mxn, respectively, be given; and let the matrix A be nonsingu- 


lameavd m <n. The matrix. A+BC: is nonsingular iff 1, + CA 'B 


is nonsingular, and if so we have the Sherman-Morrison-Woodbury 


formula: 


al aL 


(A+BC)~ = 0 eee a *pc1, + ca? 


Bat LCAtS aoe 


For m= 1, the vectors b and “© are takensas Bo and ce 
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respectively; then from (A2.4) we have the so-called Sherman- 
Morrison formula: 


Cie Sica enema A tee etter eat tyye tl atl at ant 


Dh 


Let the nonsingular matrix M be decomposed into blocks 


Ac eB 
M = ets eel hea mel me ‘ 


7 


ail 


Where AT eh. eC. Pp have dimensions nm <n; mxq, q xn, ¢q <q; 
respectively. Let the square matrix A be nonsingular. The 


Frobenius formula for the inverse matrix holds: 


where peeps oat lp: 


Using this formula, inversion of a square matrix of order 
n+q reduces to inversion of two matrices of orders mn and. q, 
respectively, and to the operations of addition and multiplication 
Oil WIMP CES Oar elsliaasvorseioisy = Sil, el eS Cl, es Ge), eS al, 

If instead of the condition |A| # 0 we introduce the assump- 


tion that |P| # 0, the Frobenius formula has a different form: 


where K = A-BP “Cc, 
More detailed information concerning properties of matrices 


may be found, for example, in Gantmacher [1], Lancaster [1], 


Bellman [2], Voevodin [1], and Coddington and Levinson [1]. 


Appendix III 


SOME PROPERTIES OF MAPPINGS 


1. SINGLE-VALUED MAPPINGS 


To each element x of a set Xoc E” let there correspond a 
unique element T(x) of aset Y «& E>, In this case we say that 
we have a single-valued mapping T of the set X into the set 
Yeeeah Wiel wae cmmelu en LOTT Vas CU mS Eom W Cle Cech lin Oma S 
image: 


Ts )a =) iy <-Ys 3° x =< Xe0suen that ty.=TCx)) 


There exist two equivalent formulations of the notion of con- 
tinuity for mappings, one in terms of convergence and one in the 
sense of Cauchy. 

ei. The mapping T is continuous at x if for any Sequence 


{x,} in X .Gonverging to theplimit «x asiek«=i@ (the limit 


exists: 
Li mini (x, ) arate) 
K-00 
e2. The mapping T as continuous at x) if fom any nevehbor— 


hoodea sot SlCx) sawel can finda ner ochboriood a) Geom ES ILGh thant 
“WCE te OW 


Of special interest are the fixed points of mappings, 1,6. , 


points belonging to the following set: 
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n 

Ko = Ax eb SxS T(x)? 
where TT: E" > ee In constructing numerical methods of solving 
systems of equations, ,nonlinear programming problems and optimal 


control problems, it is often appropriate to make a reduction to a 
problem of finding fixed points of certain mappings. One of the 
first results on the existence of fixed points of mappings was 
obtained at the beginning of the century by the Dutch mathemati- 
cian Brouwer. 

BROUWER'S FIXED-POINT THEOREM: Let X be a nonempty compact con- 
vex set in E” and the Mapp nic al eNom COMbINUOUG wae Lider 


i Sane eebeToce Cl Oulimayisys 


2. POINT-SET MAPPINGS 


Suppose some rule W _ setting each point of the space BE into 
correspondence with that of the space EM is defined. Such a 
correspondence is called multivalued or a point-set mapping and is 
denoted W: po on The set W(x) is called the image of the 
point x. Obviously, a single-valued mapping is a particular case 
of a point-set mapping under which the image of each point is a 
set consisting of a single point. Cases are possible when W is 
GdeLined on a set x SC EX and to each x € X_ there corresponds 
a subset of X, then we write W: X > a 

If the notations of continuity in terms of convergence and in 
the Cauchy sense are carried over to multivalued mappings, we 
arrive at the following two distinct notions. 


m 
A multivalued mapping W: Eo ha TSmCalled eLOsed at) xa ach 
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Wie) lohaolligateyaes Mi 26 = ox, tim yo = yy eiicl wy & WIGS) ainoily 
k k k 
k->0o kro 
VW Ge 
: s n _E" , : 
A multivalued mapping W: E + 2 is called upper semi- 


continuous at x if for anf neighborhood V of the image W(x) 
there is ammeiehborhood "Gof x “such that  W(G)] V. 

A multivalued mapping W is called closed on the set X c ae 
if itis closedjat each ix € X.. We say that the mapping W is 
closed af 31% is closed at all points at which it is .defined. » One 
extends analogously the notion of upper semi-continuity. 

The pomwn tx 


n 
ping T: BP + oF Gs 


os is called a fixed point of the point-set map- 
ae oS TO WCK) 
Brouwer's theorem has been generalized to multivalued mappings: 
KAKUTANI'S THEOREM. Let X be a nonempty compact convex set in 
mh ebaiel Tene iS Oe ee 2* be a multivalued mapping satisfying the 
conditions: 
«a. for each x « X the set W(x) is a nonempty convex 
Sup Seu Ont ma 
¢b. the mapping W is closed. 
Then the mapping W has a fixed point. 
Proofs of Brouwer's and Kakutani's Theorems may be found in 


Nikaido [1]. 


NOTES AND COMMENTS 


CHAPTER 1 
Sections 1-4. The theory of convex functions and sets, necessary and 
sufficient conditions for extrema of functions are described in 
detail in many published works. We refer the reader to Vasil'ev 
[1], Gol'shtein [1], Eremin and Astaf'ev [1], Karmanov [1], Manga— 
sarian [1], Rockafellar [1], and Pshenichnyj [3]. 
Section 5. The first published investigation of general properties 
of minimax problems was apparently Chebotarev [1]. A detailed bib- 
liography of later studies can be found in Dem'yanoyv and Malozemov 
[1], [2] and Danskin [1]. Theorems 1.5.7 - 1.5.8 have been bor- 
rowed from Evtushenko [3]. s 
Section 6. Theorem 1.6.1 is an obvious generalization of the result 
obtained by Uzawa (see Chapter 3 in Arrow et al. [1]). Theorem 
1.6.4 is due to Evtushenko [12]. Lemma 1.6.1 has been taken from 
Ky Fan et al. [1]. The assertion of Theorem 1.6.7 on the existence 
of a saddle point without equality-type constraints was first 
proved by Kuhn and Tucker [1]; and that with equality-type con- 
straints is due to Uzawa (see Chapter 3 in Arrow et al. [1]). 
Section 7. A more detailed description of conditions for constraint 
qualifications and their relatedness is given in Mangasarian [1]. 
A proof of Theorem 1.7.2 can be found in McCormick [1] and in 
Fiacco and McCormick [1]. Proofs of Theorem 1.7.3 and Lemma 1.7.5 
have been taken from Han and Mangasarian [1]. The formulation of 
Theorem 1.7.5 is due to Evtushenko and Zhadan [3]. 
Section 8. A detailed presentation of the optimal control theory is 
contained in Pontryagin et al. [1], Moiseev [2], and Gabasov and 


Karlee y 2h 
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CHAPTER 2 

Sections 1, 2. The fundamental theorems of stability theory were 
obtained at the end of the last Vcentury sbyebyapunoyv 1.) At devari— 
ed discussion of this subject can be found in numerous works in 
Sta pia iva vneorys son instance, Barbashin [1], Demidovich [1], 
KOASSOWSIe i059) [ali] 5 uae! Wenieshia if abil 

Theorem 2.2.6 was proved by Barbashin and Krassovskij. Theo- 
rem 2.2.7 is) due to Evtushenko [9], a "close result as) obtateed an 
Venets and Rybashov [1]. Several theorems on convergence of solut- 
ions to systems of ordinary differential equations are given in 
Evtushenko and Zhadan [2]. Rybashov [1], [2] examines the applica- 
tions of stability theory methods in studying optimization methods. 
Section 3. A very thorough investigation of the convergence of iter- 
ative processes is contained in Ortega and Rheinboldt [1], Faddeev 
and Faddeeva [1], Khalanaj and Veksler [1], Gaevskij, Greger, and 
Zakharias [1], among others. The assertion of Theorem 2.3.7 has 
been stated by many authors; for example, Skalkina [1]. 
Section 4. The theory of Fejer mappings has been developed by Eremin 
and his students. Their works are listed in Eremin and Mazurov Teal 
An investigation of distinct variants of the generalized gradient 
method is contained, in particular, in Ermol'ev [1], [2] and 
isa @ egies lees 
Section 5. A very detailed discussion of methods for solving systems 
of nonlinear equations, plus an extensive bibliography are contain- 
ed in Ortega and Rheinboldt [1] and Rheinboldt elles 

Studies of Newton's method are contained in Kantorovich and 
Akilov [1]. A discrete variant of Newton's method is examined in 
Shamanskij [2] and in more detail in Ortega and Rheinboldt [1]. 
Rules 1-5 for the choice of a step-length in Newton's method, with 
norms of special kind are formulated in Danilin and Panin eae 
Kuptsov and Shurshkova [1], Panin [1], Pshenichnyj 21; Polak and 
Teodoru [1], and Stoer [1]. Rules 1-5 with norms of the arbitrary 
type are formulated and studied in Burdakov [2y>; [ore “A survey "ez 
quasi-Newton methods can be found in Dennis and Moré [2] and Spe- 
dicato [1], [2], Spedicato and Greenstadt [2.5 Shanne” and Ky Phua 


[1], among others, demonstrate that these methods are very effi- 


cient in solving systems of equations and minimizing functions. 
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Methods (5.27) and (5.28) have been developed by Broyden [1], 
(5229) by Pearson [2], (5.30);by McCormick; (see Pearson, [2]), 
(5.31) by Davidon [1], Broyden [2], among others; (5.32) by Thomas 
[1 jeet5se86)sby (Daviden fx] yeFletcher and,.Powell .[1]; (5.37). by 
Broyden [3], Fletcher. [1], Goldfarb {1], Shanno [1]; (5.38) by 
Powell [2]. Adachi [1] suggests a whole class of quasi-Newton 
methods which while minimizing quadratic functions generate conju- 
gate directional vectors--some of those methods we give in Section 
5. A proof of the n-step quadratic rate of convergence of methods 
with these properties towards the minimum of a nonquadratic func- 
tion is given in McCormick and Ritter [1], Dixon [1], Danilin [1], 
Stoer [2], Baptist and Stoer [1], among others. Estimates of the 
convergence rate, aS high as those in the (n+1)-point secant meth- 
od (with the order of convergence rate being a root of the equat- 
ion ena -~ t” ~1 = 0) have been obtained lea? s@layuiil eve aki dake 
secant method for solving systems of nonlinear equations has been 
suggested for the first time by Bittner [1] and Wolfe [1]. Methods 
incorporating the secant method have been treated in Tornheim fells 
Anderson [1], Barnes [1], Shamanskij [1], and Ul'm [1], [e24) c= else 
secant method has been studied also in Polak [2], Danilin and 
Pshenichnyj [1], [2], and Pshenichnyj and Danilin [1] for solving 
programming problems. A symmetric variant of the secant method is 
due to Burdakov [1]. 
Section 6. Many works deal with numerical methods for finding the 
minimax; for example, Dem'yanov and Malozemov Papell vem! vyanov 
and Rubinov [1], Germeyer [1], and Shor [1]. A survey of methods 
for finding saddle points can be found in Dem'yanov and Pevnyj [1]. 
Theorem 2.6.1 has been borrowed from Grachev and Evtushenko 
[4], and the results of Subsections 2 and 3 from Evtushenko [3], 
[4]. Many of these methods have been generalized to the solution 
of game problems involving nonantagonistic players in Grachev and 
Evtushenko [1]. Method (6.27) has been developed by Volkonskij [1]. 
Methods for finding the global extremum are described in 
Strongin [1] and Dixon and Szegd [1]. Methods for seeking the 
global extremum on a nonuniform network are described in detail in 


Evtushenko [1] and [5]. 
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CHAPTER 3 
Section 1. The most detailed discussion of the penalty function meth- 
od can be ound ane ih taccomandeMeCormuckw ||. Lhtsemerhodehasmalso 
been treated in works on nonlinear programming numerical methods, 
for example; Vasil “ev [1] ,> Zangwill? [2).,«Karmanov’ (1],,Polaksii]; 
Pshenichnyj and Danilin [1]. In describing the penalty function 
method the present author has deviated from the traditional path; 
the presentation here follows Evtushenko [9], including Theorem 
3.1.3. An analogous result is due to Shepilov [1]. 
Section 2. Eremin was the first who pointed out the possibility of 
using exact penalty functions in solving convex programming prob- 
lems.” Among laver studies ,-thereris Zangwilie{ 1}, Skarin’ [1]5 
Pietrzykowski [1], Charalambous [1], Han and Mangasarian [1] from 
which Theorem 3.2.2 has been borrowed. In proving Theorem 3.2.1 
we were able to do without the traditional conditions for convexi- 
ty. Estimate (2.9) can be found in many works (see; for example, 
Skarin [1]. Studies of the estimation of an accuracy of the penal- 
ty function method through the method of asymptotic expansions 
appear for the first time. Polak [2] treats similar problems. 
Section 3. References to works on this subject are cited throughout 
pIVem be xcur. 
Section 4. The interior penalty function method is discussed in more 
detail in Fiacco and McCormick [1]. 


Section 5. References are cited throughout the text. 


CHAPTER 4 

ine vine wine roduc tory, part of this chapter, a number of works treat-— 
ing modified Lagrangians are listed. 

Section 1. The discussion of the simplest modification of the Lag- 
rangian (1.1) is based on Evtushenko [6], SMe eS alee [a1\Os] ean 
study of Newton's method (1.11) and the method (1.21) for ¢ =1 
without equality-type constraints is contained in Polvyalk wae tea One 
of the first implementations of Newton's method Cia) Siiasebeen 


presented in Grachev and Evtushenko [2]. 
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Sections 2-5. The discussion is based on Evtushenko [11], [12], 
Golikov and Evtushenko [1], and Evtushenko and Pavlovskij [1]. 
Results close to those described in Section 4 have been obtained 
by Kort and Bertsekas [1]. An approach close to that described in 
Section 5 has been developed by Mangasarian [2]. 

Section 6. The discussion follows Golikov and Zhadan [1]. Another 


approach to studying the primal methods is examined in Antipin [1]. 


CHAPTER 5 
Section 1. The idea of the reduced gradient method has been enunci- 
ated by many authors; see, for example, the article of Arrow and 
Solow referred to in Arrow, Hurwicz, and Uzawa [1]. 
Sections 2, 3. The discussion follows Evtushenko [7] and Evtushenko 
andmAnadanmdol, slo lem oOnemresultus or the numerical computations 
using the method (2.4) are given in Efimenko and Zagorujko [1]. 
A different generalization of the reduced gradient method to prob- 
lems involving equality-type constraints can be found in Rosen [1]. 
A similar approach has been followed in Akim and Ehneev [1] and ¥ 
Ehneev [1]. 
Sections 4, 5. Properties of the gradient projection method and the 
conditional gradient method have been studied in detail in Karma- 


nov [1], Dem'yanov and Rubinov [1], and Vasil'tev [1]. 


CHAPTER 6 
Some trends in the development of numerical methods for solving 
optimal control problems are delineated in the introductory part 
of this chapter. These methods have been studied extensively and 
a complete bibliography is hard to list. 
Sections 1, 2, 3. The discussion follows Grachev and Evtushenko [5], 
[6] and Evtushenko [12]. A different approach to deriving the 
formulas (1.7) is developed in Polyak [1]. 
Sections 4, 5. The discussion is based on the approach suggested by 
Evtushenko in [12]. The concept of the quasiminimum principle and 
its justification are presented in Gabasov and Kirillova [1]. The 


interpretation and proof of the quasiminimum principle has been 
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slightly modified. Still another approach to deriving the dis-— 
crete maximum principle can be found in Boltyanskij [1] and 

Pi@wO eezils 

Section 6. In addition to the works cited throughout the text, we 
refer the reader to Dem'yanov [1] and Dem'yanov and Rubinov [1]. 
Section 7. This section has been written by this author together 
with N.I. Grachev. The computational results for the problem (7.1) 
obtained via Newton's method have been a courtesy of V.A. Purtov. 
Section 8. The results obtained in solving Isaacs' problem have been 
taken from Grachev and Evtushenko [5]. It is particularly simple 
to solve game problems in the case where the program strategies 
have a saddle point; in this situation many well-known methods 

are no longer applicable (see Dem'yanov and Pevnyj [1] and Grachev 
and Evtushenko [3]). 


CHAPTER 7 
Section 1. The described approach has been suggested for the first 
time to Evtushenko [1], [5]. 
Section 2. The program to implement the basic method in ALGOL-60 has 
been given in Evtushenko [1] and an improved program in Evtushenko 
[2]. Neither program, however, did account for the increasing 
components of the vector v, which has been accomplished by M.A. 
Potapov. Another discription of this method can be found in 
Vasil'ev [1], where the presentation is rather cumbersome due to 
the absence of recursive procedures. 
Section 3. The discussion follows Evtushenko [5]. 
Section 5. The discussion follows Evtushenko [2], which also con- 
tains a program for implementing the method of finding the sequent-— 
ial minimax in ALGOL-60. This program will essentially improve if 
the modifications of the basic method described in Section 2 have 
been used. 
Section 6. The idea of generalizing the method of nonuniform cover- 
ings to solve multicriteria problems was expressed in Evtushenko 
and Potapov [1].. Problems of finding the extremum of a function 


are treated in Strongin [1] and Dixon and Szegs aes 
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